麻省理工学院出版社基本知识系列
The MIT Press Essential Knowledge Series
本系列的完整书目列表见本书的后面。
A complete list of the titles in this series appears at the back of this book.
帕诺斯·洛里达斯
Panos Louridas
麻省理工学院出版社 | 马萨诸塞州剑桥 | 英国伦敦
The MIT Press | Cambridge, Massachusetts | London, England
© 2020 麻省理工学院
© 2020 Massachusetts Institute of Technology
保留所有权利。未经出版商书面许可,不得以任何电子或机械方式(包括影印、录制或信息存储和检索)复制本书的任何部分。
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
本书由 New Best-set Typesetters Ltd. 使用 Chaparral Pro 排版。
This book was set in Chaparral Pro by New Best-set Typesetters Ltd.
美国国会图书馆出版编目数据
Library of Congress Cataloging-in-Publication Data
姓名:Louridas、Panos,作家。
Names: Louridas, Panos, author.
标题:算法/Panos Louridas。
Title: Algorithms / Panos Louridas.
描述:马萨诸塞州剑桥:麻省理工学院出版社,[2019] | 系列:麻省理工学院出版社基本知识系列 | 包括参考书目和索引。
Description: Cambridge, Massachusetts : The MIT Press, [2019] | Series: The MIT Press essential knowledge series | Includes bibliographical references and index.
标识符:LCCN 2019040771 | ISBN 9780262539029(平装)
Identifiers: LCCN 2019040771 | ISBN 9780262539029 (paperback)
主题:LCSH:算法——流行作品。| 计算机算法——流行作品。
Subjects: LCSH: Algorithms—Popular works. | Computer algorithms—Popular works.
分类:LCC QA76.9.A43 .L668 2019 | DDC 005.13—dc23
Classification: LCC QA76.9.A43 .L668 2019 | DDC 005.13—dc23
LC 记录可访问https://lccn.loc.gov/2019040771
LC record available at https://lccn.loc.gov/2019040771
10 9 8 7 6 5 4 3 2 1
10 9 8 7 6 5 4 3 2 1
d_r0
d_r0
世界是无法翻译的,但并非无法理解,只要你知道一个简单的规则:它通过无数的生命和生物所表达的东西后面没有一个问号,只有感叹号。
The world is untranslatable but it is not incomprehensible, as long as you know the simple rule that nothing of what it expresses through its myriad lives and creatures is followed by a question mark, only by exclamation marks.
——卡尔·奥维·诺斯加德,《夏天》
—Karl Ove Knausgaard, Summer
麻省理工学院出版社的“基本知识”系列丛书提供通俗易懂、简洁明了、制作精美的袖珍书籍,涵盖当前热门话题。本系列丛书由顶尖思想家撰写,涵盖文化、历史、科学和技术等各个领域的专家概述。
The MIT Press Essential Knowledge series offers accessible, concise, beautifully produced pocket-size books on topics of current interest. Written by leading thinkers, the books in this series deliver expert overviews of subjects that range from the cultural and the historical to the scientific and the technical.
在当今这个信息即时满足的时代,我们唾手可得各种观点、合理化解释和肤浅的描述。而那些能够帮助我们对世界形成原则性理解的基础知识却难以获得。《基本知识》系列丛书正是满足了这一需求。这些紧凑的书籍为非专业人士整合了专业主题,并通过基础知识探讨关键话题,为读者提供了理解复杂思想的切入点。
In today’s era of instant information gratification, we have ready access to opinions, rationalizations, and superficial descriptions. Much harder to come by is the foundational knowledge that informs a principled understanding of the world. Essential Knowledge books fill that need. Synthesizing specialized subject matter for nonspecialists and engaging critical topics through fundamentals, each of these compact volumes offers readers a point of access to complex ideas.
我认识两个十几岁的少年,他们的知识比以往任何科学家、哲学家或学者都渊博。他们是我的儿子。不,我并非溺爱孩子的父亲,不会赞叹孩子们的非凡天赋。但这两个孩子口袋里装着的设备,让他们能够与有史以来最浩瀚的信息宝库相连。如今,他们已经掌握了在互联网上查找信息的技巧,没有他们无法解答的事实问题。他们可以进行外语互译,而无需翻阅厚厚的字典——我们至今仍保留着字典,以便他们了解几年前的情况。来自任何地方的新闻都能瞬间传到他们手中。无论他们身处世界何处,他们都能在不知不觉中与同龄人交流。他们可以完美地规划出行计划。然而,他们却可以肆意浪费时间玩游戏或追随那些变化如此之快的潮流,我真不知道这些潮流有什么意义。
I know two young teenagers who possess more knowledge than any scientist, philosopher, or scholar of ages past. They are my sons. No, I am not a doting father who marvels at how extraordinarily gifted his children are. But these two kids have in their pockets devices that connect them with the vastest repository of information that has ever been created. There is no factual question they cannot answer, now that they have mastered the art of knowing where to look on the internet. They can translate from and to foreign languages without having to browse through hefty dictionaries—which we still keep in the house so that they know how things were, only a few years back. News, from anywhere, reach them in an instant. They can communicate with their peers before you know it, no matter where in the world they may live. They can plan their goings out in perfect detail. Alas, they can waste their time with abandon playing games or following trends that change so fast that I do not know why they matter.
以上所有都得益于数字技术的巨大进步。如今,我们口袋里的计算能力比当年将人类送上月球时还要强大。正如这两位青少年所展现的,我们的生活发生了翻天覆地的变化;预测未来多种多样,从乌托邦(人们无需工作)到反乌托邦(少数特权阶层将过上充实的生活,其余人则沉溺于无意义的麻木之中)。值得庆幸的是,我们能够塑造这样的未来,而我们能够做到这一点的一个重要因素是我们对成就和变革背后的技术的熟悉程度。尽管我们可能在忙碌的日常生活中忽略了这一点,但我们正处于人类历史上最好的时期。我们比以往任何时候都更加健康,平均预期寿命也比以往任何一代人都要长。尽管存在着明显的不平等现象,但人类的广大民众已经摆脱了贫困的枷锁。我们彼此从未如此亲近,无论是虚拟的还是现实的。我们或许会谴责大众全球旅游的商业化,但廉价的旅行让我们能够体验不同的文化,游览曾经只能在咖啡桌书籍中欣赏的地方。所有这些进步都可以而且应该继续下去。
All the above have become possible thanks to the huge advances in digital technology. Today we carry more computing power in our pockets than was used to ferry humans to the moon. As these two teenagers show, the changes in our lives have been immense; predictions for the future vary from utopias, where people will really not need to work, to dystopias, where the privileged few will lead fulfilling lives, with the rest being condemned to inconsequential torpor. Thankfully, we are able to shape this future, and an important factor in our ability to do this is how conversant we are with the technologies that underlie the achievements and the changes before us. Although we may lose sight of it in the bustle of our everyday lives, we live in the best period of human history. We are healthier than we have ever been, and expect to live longer, on average, than any generation that has ever lived. Despite the iniquity of glaring inequality, huge swathes of humanity have gotten rid of the shackles of poverty. We have never been closer to one another, both virtually and literally. We may decry the commercialism of mass global tourism, but cheap travel allows us to experience different cultures and visit places that we could once marvel about only in coffee table books. All this progress can and should continue.
然而,要参与这一进步,仅仅使用数字技术是不够的。我们必须能够理解它。首先,出于非常实际的原因,它提供了绝佳的职业发展机会。其次,即使我们并不热衷于技术行业,我们也必须了解其基本原理,才能充分发挥其潜力,并塑造我们自身在其中的角色。数字技术同样得益于其硬件,即构成其物理组件的硬件。软件,即在其上运行的程序,构成了计算机和数字设备的基石。程序的核心是它们实现的算法:一组描述如何解决特定问题的指令(如果这看起来不像是对算法的定义,别担心,本书的其余部分会详细介绍)。没有算法,计算机将毫无用处,任何现代技术都将不复存在。
To partake in this progress, however, it is not enough to use digital technology. We must be able to understand it. First, for the eminently practical reason that it offers excellent career opportunities. Second, because even if we don’t care for a career in technology, we must know its underlying principles to appreciate its potential and shape our own role in it. Digital technology is enabled as much by its hardware, the physical components that make up computers and digital devices, as by its software, the programs that run on it. The backbone of programs are the algorithms that they implement: the set of instructions that describe the way to solve particular problems (if this does not look like a definition of what an algorithm is, don’t worry, we have the rest of the book to fill out the details). Without algorithms, computers would be useless, and none of modern technology would exist.
我们需要知道的事情会随着时间而改变。在人类历史的大部分时间里,学校教育被认为根本不是必要的。大多数人都是文盲,如果他们学到了一些东西,那也只是掌握一些实用技能或经文。19世纪初,全球超过80%的人口完全没有接受过学校教育;如今,绝大多数人都接受了数年的学校教育,预计到本世纪末,全球未受过学校教育的人口比例将降至零。我们花在教育上的时间也增加了。1940年,只有不到5%的美国人拥有学士学位,而到2015年,几乎有三分之一的人拥有学士学位。1
What we need to know changes through time. For most of human history, schooling was not deemed necessary at all. Most people were illiterate, and if they were taught something, it would be mastery of some practical skill or scripture. In the beginning of the nineteenth century, more than 80 percent of the world’s population was completely unschooled; now the vast majority has attained several years of school, and it is projected that by the end of the century, the proportion of unschooled people in the world will fall to zero. The years we spend on education have also increased. While in 1940 less than 5 percent of Americans had a bachelor’s degree, by 2015 almost a third of them did.1
早在19世纪,没有学校会教授分子生物学,因为当时没人知道它;DNA直到20世纪后期才被发现。如今,它已成为我们公认的受过教育人士学习的经典内容。同样,尽管算法在古代就被发现,但很少有人在现代计算机出现之前,算法一直困扰着我们。作者坚信,我们已经达到了这样的境界:算法已成为我们视为基本知识的核心。除非我们了解它们是什么以及它们如何运作,否则我们就无法理解它们能做什么、它们如何影响我们、对它们有何期望、它们的局限性是什么以及它们需要什么才能发挥作用。在一个日益依赖算法运转的社会中,作为知情的公民,我们有责任了解它们。
Back in the nineteenth century, no school would teach molecular biology because nobody knew anything about it; DNA wasn’t discovered until well into the twentieth century. It now forms part of what we accept as the canon of an educated person’s learning. Similarly, even though algorithms were discovered in antiquity, few people troubled with them until the advent of modern computers. The author firmly believes that we have reached a point where algorithms are inside the core of what we consider to be essential knowledge. Unless we know what they are and how they work, we cannot understand what they can do, how they can affect us, what to expect from them, what their limits are, and what they require in order to work. In a society that increasingly functions thanks to algorithms, it behooves us as informed citizens to be knowledgeable about them.
数字技术的实现既取决于硬件(构成计算机和数字设备的物理组件),也取决于软件(在其上运行的程序)。程序的支柱是它们所实现的算法。
Digital technology is enabled as much by its hardware, the physical components that make up computers and digital devices, as by its software, the programs that run on it. The backbone of programs are the algorithms that they implement.
学习算法也可能以其他方式帮助我们。如果学习数学能让我们掌握一种严谨的推理方式,那么熟悉算法则能让我们掌握一种新的算法思维方式:一种以实用的方式解决问题的推理方式,从而使算法以程序的形式高效实现,并在计算机上快速运行。即使我们不是专业的程序员,专注于设计实用高效的流程也能成为一种有益的思维工具。
It is also possible that learning algorithms helps us in another way. If learning mathematics introduces us to a way of rigorous reasoning, a familiarity with algorithms introduces us to a new way of algorithmic thinking: a way of reasoning to solve problems in a practical way so that efficient implementations of algorithms as programs can run fast in computers. The focus on designing processes that are practical and efficient can be a useful mental tool, even if we are not professional programmers.
本书旨在以一种读者能够理解算法实际工作原理的方式,向非专业人士介绍算法。本书的目的并非描述算法对我们生活的影响;其他一些书籍也很好地描述了大数据处理、人工智能以及计算设备如何融入我们日常生活,这些改进将如何改变人类状况。我们对此不感兴趣。我们关注的是可能发生的情况,而不是如何发生的。为此,我们将展示真实的算法,不仅展示它们的作用,还展示它们的实际运作方式。我们将提供详细的解释,而不是徒手画蛇添足。
This book aims to introduce algorithms to a nonspecialist audience in a way that the reader will understand how they really work. Its purpose is not to describe the effects of algorithms in our lives; there are other books that do a great job of depicting how improved processing of big data, artificial intelligence, and the weaving of computing devices into the fabric of our everyday lives may change the human condition. Here we are not interested in what may happen but rather the how this can happen. To do that, we’ll present real algorithms and show not only what they do but also how they actually function. Instead of hand waving, we’ll provide detailed explanations.
熟悉算法可以让我们了解一种新的算法思维方式:一种以实用的方式解决问题的推理方式,以便算法作为程序的有效实现可以在计算机中快速运行。
A familiarity with algorithms introduces us to a new way of algorithmic thinking: a way of reasoning to solve problems in a practical way so that efficient implementations of algorithms as programs can run fast in computers.
对于“什么是算法?”这个问题,答案出奇地简单。它们是解决问题的特定方法。这些解决问题的方法可以用简单的步骤描述,以便计算机能够以惊人的速度和效率执行它们。然而,这些解决方案并没有什么神奇之处。它们由简单的基本步骤组成,这意味着它们没有理由超出大多数人的理解范围。
To the question, “What are algorithms?” the answer is surprisingly simple. They are particular ways to solve our problems. These ways to solve our problems can be described in easy steps so that computers can execute them with amazing speed and efficiency. Yet there is nothing magical about these solutions. The fact that they comprise simple elementary steps means that there is no reason why they should be beyond the grasp of most people.
事实上,本书并不要求读者具备高中常规教学内容以外的知识。接下来的几页中确实会出现一些数学知识,因为如果没有一些符号,你就无法认真讨论算法。文中解释了算法中常见但在计算机科学之外不太常见的概念。
Indeed, the book does not assume knowledge of material beyond that commonly taught in high schools. Some mathematics does appear in the following pages because you cannot talk seriously about algorithms without some notation. Any concepts that are commonplace in algorithms but are not that common outside computer science are explained in the text.
已故物理学家斯蒂芬·霍金在其1988年出版的畅销书《时间简史》的前言中写道:“有人告诉我,我在书中每加一个方程式,销量就会减半。” 这对于本书来说似乎不祥,因为数学确实不止一次出现。但我决定继续下去,原因有二。首先,虽然……霍金物理学所需的数学知识水平在大学或更高,而本书介绍的数学知识更容易理解。其次,由于本书的目的不仅在于展示算法的用途,还在于展示算法的实际工作原理,因此读者应该能够了解一些我们在讨论算法时使用的词汇。这些词汇确实包含一些数学知识。符号并非技术知识分子的特权,熟悉这些符号将有助于消除围绕该主题的任何神秘感;最终,我们将发现,这主要取决于能够以精确的量化方式讨论事物。
The late physicist Stephen Hawking wrote in the introduction of his best-selling book A Brief History of Time, published in 1988, “Someone told me that each equation I included in the book would halve the sales.” This sounds pretty ominous for the present book because mathematics does occur more than once. Yet I decided to press ahead, for two reasons. First, while the level of mathematics required for Hawking’s physics is at the university level or beyond, the mathematics presented here is much more accessible. Second, as the purpose of this book is to show not just what algorithms are for but how they really work too, the reader should get to share some of the vocabulary we use when we discuss algorithms. And this vocabulary does include some mathematics. The notation is not the prerogative of the technical clerisy, and familiarity with it will help dispel any mystique surrounding the subject; in the end, we’ll see that it mostly comes down to being able to talk about things in a precise quantitative way.
本书不可能涵盖算法的整个主题,但可以提供概述并向读者介绍算法思维。第一章通过介绍什么是算法以及如何衡量其效率奠定了基础。我们可以从一开始就说,算法是我们可以用笔和纸执行的有限步骤序列,这个简单的定义与事实相差无几。第一章从这里开始,同时探讨了算法与数学之间的关系。两者之间的一个关键区别是实用性;在算法中,我们感兴趣的是解决问题的实用方法。这意味着我们需要能够衡量算法的实用性和效率。我们将看到,这些问题可以通过计算复杂性;这将为本书其余部分的算法讨论提供参考。
It is impossible to cover the whole subject of algorithms with a book like this, but it is possible to provide an overview and introduce a reader to algorithmic thinking. The first chapter lays the ground by introducing what algorithms are and how we can gauge their efficiency. We can say at the outset that an algorithm is a finite sequence of steps that we can perform with a pen and paper, and this plain definition would not be far from the truth. Chapter 1 starts from there, while also exploring the relationship between algorithms and mathematics. A key difference between the two is practicality; in algorithms, we are interested in practical ways to solve our problems. This means that we need to be able to measure how practical and efficient our algorithms are. We’ll see that these questions can be carefully framed through the notion of computational complexity; this will inform the discussion of algorithms in the rest of the book.
接下来的三章将探讨算法的三个最重要的应用领域。第二章涵盖了解决与事物网络(称为图)相关问题的算法。这些问题可能包括在道路网络中寻找方向,或在社交网络上找到与某人连接的链接序列。它们还包括其他一些在关系上并不明显的领域中的问题:DNA测序和锦标赛的调度;这将说明如何使用相同的工具有效地解决不同的问题。
The next three chapters look at three of the most essential application areas of algorithms. Chapter 2 covers algorithms that deal with the solution of problems relating to networks, called graphs, of things. These problems may include finding your way in a road network or sequence of links connecting you to somebody on a social network. They also include problems in other areas that are not immediately obvious in terms of their relationship: DNA sequencing and scheduling tournaments; this will illustrate that distinct problems can be solved efficiently using the same tools.
第三章和第四章探讨了如何搜索和整理事物。这些看似平凡,却又是计算机最重要的应用之一。计算机花费大量时间进行排序和搜索,但我们往往忽略了这一点,因为它们是大多数应用程序中不可或缺的、不可见的部分。排序和搜索也让我们得以一窥算法的一个重要方面。对于许多问题,我们知道不止一种算法可以解决它们。我们根据算法各自的特性来选择它们;有些算法比其他算法更适合特定的问题实例。因此,了解具有不同特性的不同算法如何解决同一问题,将大有裨益。
Chapter 3 and chapter 4 explore how to search for things and put things in order. These may seem prosaic, yet they are among the most important applications of computers. Computers spend a lot of time sorting and searching, but we are largely oblivious to this fact exactly because they are an integral, invisible part of most applications. Sorting and searching also offer us a glimpse of an important facet of algorithms. For many problems, we know of more than one algorithm to solve them. We choose among the available algorithms based on their particular characteristics; some algorithms are more suitable for certain problem instances than others. It is therefore instructive to see how different algorithms, with different characteristics, go about solving the same problem.
接下来的两章将介绍算法在大规模计算中的重要应用。第五章将再次运用图表来解释PageRank算法,该算法可用于按重要性对网页进行排序。PageRank是谷歌成立之初使用的算法。该算法在搜索结果中对网页进行排名的成功,对谷歌公司的巨大成功起到了至关重要的作用。幸运的是,理解PageRank的工作原理并不难。这将使我们有机会了解算法如何解决一个乍一看似乎无法用计算机解决的问题:我们如何判断什么是重要的?
The following two chapters present important applications of algorithms on a large scale. Chapter 5 picks up graphs again to explain the PageRank algorithm, which can be used to rank web pages in order of significance. PageRank was the algorithm used by Google when it was founded. The success of the algorithm at ranking web pages in search results played a critical role in the phenomenal success of Google as a company. Fortunately, it is not difficult to grasp how PageRank works. It will give us the opportunity to see how an algorithm can solve a problem that on first impression, does not seem amenable to a computer solution: How do we judge what is important?
第六章介绍了计算机科学中最活跃的领域之一:神经网络和深度学习。大众媒体报道了神经网络的成功应用。这些故事通过描述执行图像分析、自动翻译或医学诊断等任务的系统来激发我们的兴趣。我们将从简单的单个神经元开始,构建越来越大的神经网络,使其能够执行越来越复杂的任务。我们将看到,它们的工作原理都基于一些基本原理。它们的有效性源于许多简单组件的互连,以及应用一种让神经网络学习的算法。
Chapter 6 introduces one of the most active areas in computer science: neural networks and deep learning. Successful applications of neural networks are reported in popular media. Stories pique our interest by describing systems that perform tasks such as image analysis, automatic translation, or medical diagnosis. We’ll start out simple, from individual neurons, building up bigger and bigger neural networks that are able to perform more and more complex tasks. We’ll see that they all work based on some fundamental principles. Their efficacy rises from the interconnection of many simple components and the application of an algorithm that lets neural networks learn.
在概述了算法的功能之后,结语探讨了计算的极限。我们知道计算机已经完成了令人惊叹的壮举,并且期待它也能如此。未来它们会发挥更多作用,但它们又有哪些能力无法做到呢?关于计算极限的讨论将使我能够更精确地解释算法和计算的本质。我们说过,我们可以将其描述为可以用纸笔执行的有限步骤序列,但这些步骤究竟是什么样的呢?纸笔的类比与算法的本质究竟有多接近?
After sketching what algorithms can do, the epilogue explores the limits of computation. We know that computers have performed amazing feats and expect so much more from them in the future, yet are there things that they cannot do? The discussion of the limits of computing will allow me to offer a more precise explanation of the nature of algorithms and computing. We said that we could describe it as a finite sequence of steps that can be performed with a pen and paper, but what kind of steps could these be? And how close is the pen-and-paper analogy with what algorithms really are?
首先,我要感谢麻省理工学院出版社的 Marie Lufkin Lee 提出本书的构思;感谢 Stephanie Cohen 在整个过程中给予我的悉心指导;感谢 Cindy Milstein 一丝不苟的编辑;以及 Virginia Crossman 对细节的关注和对一切的悉心照料。一本关于算法的书应该成为“基本知识”系列的一部分,我很自豪能够成为这本书的作者。
First and foremost, I am grateful to Marie Lufkin Lee at the MIT Press for coming up with the idea for this book, Stephanie Cohen for goading me gently through the process, Cindy Milstein for her meticulous editing, and Virginia Crossman for her excellent attention to detail and taking care of everything. A book on algorithms should be part of the Essential Knowledge series, and I am proud that I am the one to write it.
我向 Diomidis Spinellis 对本书部分内容的评论表示感谢,并特别感谢 Konstantinos Marinakos,他阅读了手稿,发现了令人尴尬的错误,并提出了慷慨的改进建议。
I extend my thanks to Diomidis Spinellis for commenting on parts of the book, and my special appreciation to Konstantinos Marinakos, who read the manuscript, spotted embarrassing bugs, and offered generous suggestions for improvements.
最后,我要感谢两位青少年,阿德里亚诺斯 (Adrianos) 和埃克托 (Ektor),他们的生活在很大程度上取决于这本书的主题,还有他们的母亲埃莱尼 (Eleni);是他们让我能够实现这本书。
Finally, I want to express my gratitude to two teenagers, Adrianos and Ektor, whose lives will to such an extent be determined by the subject matter of this book, and their mother, Eleni; they enabled me to make this happen.
我们喜欢给时间段贴上标签,也许是因为给时间贴上标签可以让我们掌握它的流动性。因此,我们开始将现在称为新算法时代的曙光,在这个时代,算法将占据主导地位,并将控制我们生活中越来越大的部分。有趣的是,我们不再谈论计算机时代或互联网时代。我们不知何故将它们视为理所当然。当我们添加算法时,我们开始暗示也许一些本质上不同的事情已经开始发生。“看看全能的算法,一段计算机代码在我们这个世俗时代代表着更高的权威,一种神,”前《纽约时报》记者、 Radio Open Source节目主持人克里斯托弗·莱登说。的确,当算法被用来组织政治运动、追踪我们在网络世界的踪迹、跟踪我们的购物行为并精准投放广告、推荐约会对象或监测我们的健康状况时,它们就被视为某种形式的更高权威。1
We like putting labels on time periods, perhaps because affixing a tab on time allows us to get a grip on its fluidity. We have therefore started speaking of the present as the dawning of a new algorithmic age, in which algorithms will reign supreme, and will govern larger and larger parts of our lives. It is interesting that we are not talking about the computer age or internet age anymore. We somehow take them for granted. It is when we add algorithms that we begin intimating that perhaps something qualitatively different has started taking place. “Behold the Almighty Algorithm, a snippet of computer code coming to stand for a Higher Authority in our secular age, a sort of god,” says Christopher Lydon, former New York Times journalist and host of the Radio Open Source show. And indeed, algorithms are taken to be some form of higher authority when they are used to organize political campaigns, follow our traces in the online realm, shadow our shopping and target us with advertising, suggest dating partners, or monitor our health.1
这一切都笼罩着一层神秘的氛围,或许这反而让算法的追随者们沾沾自喜。被描述为“程序员”或“计算机科学家”,标志着你是一个正派的人,尽管略带技术性。成为这个即将改变我们生活几乎所有事物的群体的一员,岂不更令人欣慰?
There is an aura of mystery around all that, which perhaps flatters the acolytes of algorithms. Being described a “programmer” or “computer scientist” marks you as a decent, albeit somewhat technical, character. How much better to be a member of the tribe that is about to change almost everything in our lives?
算法确实在某种程度上被视为某种神明。它们大多不负责任,就像神明一样;事情的发生并非出于人类的意志,而是由算法决定的,而算法本身却不承担责任。运行算法的机器能够在越来越多的领域超越人类,以至于人类的优势似乎正在日益缩小;有些人认为,计算机在认知的各个方面超越人类的那一天并不遥远。
There is definitely a sense in which algorithms are a sort of god. They are mostly held unaccountable, like gods; things happen, not because of human agency, but because they were decided by an algorithm, and the algorithm sits beyond the pale of responsibility. Machines, running algorithms, can surpass human performance in more and more fields so that it appears that the area of human superiority is reduced day by day; some believe that the day where computers will be able to surpass humans in every aspect of cognition is not far away.
但也有某种意义,算法并不像神,尽管我们常常忽略这一点。算法并非通过启示而产生其结果。我们确切地知道它遵循的规则和它采取的步骤。无论结果多么精彩,它总能追溯到一些基本的运算。对于算法新手来说,这些运算如此简单,或许会让他们感到惊讶。这并不是要贬低算法;了解某个事物真正的运作方式,或许能揭开它神秘面纱的一部分。同时,理解某个事物的运作方式,或许能让我们欣赏其设计的优雅,即使它不再神秘。
But there is also a sense in which algorithms are nothing like gods, although we often lose sight of it. An algorithm does not produce its results by an act of revelation. We know exactly the rules that it follows and kinds of steps it takes. No matter how wonderful the outcome, it can always be traced back to some elementary operations. To people who are newcomers to algorithms, it may come as a surprise how elementary these may be. That is not to besmirch algorithms; seeing how something really works may take out some part of its mystique. At the same time, understanding how something works may allow us to appreciate the elegance of its design, even if it is no longer mysterious.
本书的前提是,算法其实并不神秘。它们是让我们能够出色完成某些任务的工具;它们是特定类型的工具,其目的是帮助我们解决问题。从这个意义上讲,它们是认知工具;因此,它们并非唯一的工具。数字和算术也是认知工具。我们花了数千年的时间才发展出一套孩子们可以在学校学习的数字系统,使他们能够进行一些没有数字系统就无法进行的计算。如今,我们认为计算能力是理所当然的,但在几代人之前,只有极少数人了解它。
The premise of this book is that indeed algorithms are not mysterious. They are tools that allow us to do certain things well; they are specific kinds of tools whose purpose is to allow us to solve problems. In this way they are cognitive tools; as such, they are not the only ones. Numbers and arithmetic are also cognitive tools. It took us thousands of years to evolve a number system that children can learn in school so that they can perform calculations that would be impossible without it. Now we take numeracy for granted, but a few generations back only a small minority of humans had any knowledge of it.
同样,算法知识不应成为少数精英的特权;作为认知工具,它们可以被各种各样的人理解,而不仅仅是计算机专业人士。更重要的是,它们应该被更多人理解,因为这将使我们能够正确看待算法:了解它们做什么、它们如何做,以及我们可以实际地期望它们做什么。
Similarly, knowledge of algorithms should not be the prerogative of a small elite minority; as cognitive tools they can be apprehended by all kinds of people, not just computer professionals. What is more, they should be understood by more people because that will allow us to put algorithms into perspective: to know what they do, how they do it, and what we can realistically expect them to do.
我们追求的是算法的基本知识,这样我们才能在算法时代的讨论中发挥有意义的作用。这个时代并非强加于我们,而是我们基于自己设计的工具自主创造的时代。本书的主题正是对这些工具的研究。算法是优美的工具,了解它们的构造和工作原理可以提升我们的思维方式。
An essential knowledge of algorithms is what we are after here so that we can take a meaningful part in the conversations on the algorithmic age. That is not an age that is thrust on us, but one of our own creation, based on tools that we have devised. The study of these tools is the subject of this book. Algorithms are beautiful tools, and a glimpse of how they are made and work can enhance our way of thinking.
我们先来澄清一个令人厌烦的观念:算法是计算机的专利。我们会发现,这和说数字是计算器的专利一样有道理。
We’ll start by dispelling an irksome notion: that algorithms are about computers. This, we’ll see, makes as much sense as saying that numbers are about calculators.
纸笔谜题、音乐、数的可除性以及粒子物理学中的中子加速器——我们会发现,它们的共同点在于相同的算法,应用于如此不同的领域,却基于相同的底层原理。这怎么可能呢?
A pen-and-paper puzzle, music, divisibility of numbers, and neutron accelerators in particle physics—we’ll see that what they all have in common is the same algorithm, applied to such different domains, yet working on the same underlying principles. How can this be?
“算法”一词本身并不能揭示其含义。它源于穆罕默德·伊本·穆萨·花拉子密(约780年-约850年)的名字,他是一位波斯学者,研究数学、天文学和地理学。花拉子密的贡献广泛而广泛。“代数”一词源于他最具影响力的著作《计算简明书》的阿拉伯语书名。 完成和平衡。他的第二本最具影响力的著作是《印度数字计算法》,该书论述了算术,后被译成拉丁文,将印度-阿拉伯数字系统引入西方。花剌子密的名字拉丁化为Algorismus,意为使用十进制数进行数值计算的方法。Algorismus 受希腊语中“数字”一词(arithmos,即算术)的影响,后来演变为 algorithm,仍指十进制算术,直到 19 世纪才获得其现代含义。
The word “algorithm” itself does not reveal its meaning. It comes from the name of Muḥammad ibn Mūsā al-Khwārizmī (ca. 780–ca. 850), a Persian scholar who worked on mathematics, astronomy, and geography. Al-Khwārizmī’s contributions were many and widespread. The term “algebra” comes from the Arabic title of his most influential work, The Compendious Book on Calculation by Completion and Balancing. His second most influential book, On the Calculation with Hindu Numerals, was on arithmetic and, translated into Latin, introduced the Hindu-Arabic numeral system to the West. Al-Khwārizmī’s name was latinized to Algorismus, which came to denote the method of numerical computation with the decimal numbers. Algorismus, influenced by the Greek word for “number” (arithmos, as in arithmetic), became algorithm, still denoting decimal arithmetic, before acquiring its modern sense in the nineteenth century.
你可能会认为算法是我们用计算机完成的事情,但这是错误的。这是错误的,因为我们在计算机出现之前很久就有了算法。已知的第一个算法可以追溯到古巴比伦。2这也是一个错误,因为算法与计算机无关。算法是关于以特定方式做某事,遵循某些步骤。这有点模糊。你可能会问,什么样的步骤?什么具体的方式?我们可以消除所有模糊性,并给出算法是什么以及它做什么的精确数学定义——这样的定义确实存在——但我们不需要走这么远。你可能会很高兴地知道,算法是一组可以用笔和纸遵循的步骤,你可以放心,这个看似简单的描述与数学家和计算机科学家使用的描述很接近。
You could be tempted to think that algorithms are something that we do with computers, but this would be wrong. It is wrong because we had algorithms long before we had computers. The first-known algorithms date back to ancient Babylon.2 It is also wrong because algorithms are not about problems that have to do with computers. Algorithms are about doing something in a specific way, following some kind of steps. That is somewhat vague. You may ask, What kind of steps? What specific way? We can dismiss all vagueness, and give a precise mathematical definition of what an algorithm is and what it does—such a definition does exist—but we don’t need to go to such lengths. You may be happy to know that an algorithm is a set of steps that you can follow with pen and paper, and you can be assured that this seemingly facile description is close to those used by mathematicians and computer scientists.
你可能会认为算法是我们用计算机实现的,但这是错误的。这是错误的,因为早在计算机出现之前,我们就已经有了算法。
You could be tempted to think that algorithms are something that we do with computers, but this would be wrong. It is wrong because we had algorithms long before we had computers.
因此,我们可以从一个只需写下来就能解决的问题开始我们的算法探索。假设我们有两组对象,并且希望将其中一组的对象尽可能均匀地分布在另一组的对象中。我们将对第一组的对象使用叉号 ( ),对第二组的对象使用项目符号 (
)。我们希望将叉号分布在项目符号之间。
So we can start our approach to algorithms with a problem that we can solve by just writing things down. Suppose we have two sets of objects and want to spread the objects of one of the two sets as evenly as possible among the objects of the other set. We will use crosses () for the objects of the first set and bullets () for the objects of the second set. We want to spread out crosses among the bullets.
如果叉号的数量能整除物体总数,那就很容易了。我们只需像做除法一样,把叉号分成几份,再分成几份。例如,如果我们总共有 12 个物体,其中 3 个是叉号,9 个是子弹号,那么我们先放一个叉号,然后放三个子弹号,再放一个叉号,三个子弹号,最后再放一个叉号,三个子弹号:
If the number of crosses divides the total number of objects, that is easy. We just partition the crosses among the bullets as if we would do division. For example, if we have 12 objects in total, out of which three are crosses and nine are bullets, we put one cross, then three bullets, then one cross, three bullets, and finally another cross and three bullets:
但是,如果所有物体(十字和子弹)的总数不能被十字的数量整除怎么办?如果我们有五个十字和七颗子弹,该怎么办?
But what if the total number of objects, crosses and bullets taken together, cannot be divided exactly by the crosses? What can we do if we have five crosses and seven bullets?
我们首先将所有叉号和所有项目符号放在一行中,如下所示:
We start by putting all crosses followed by all bullets in one row as follows:
然后我们取出五颗子弹并将它们放在十字架下面:
Then we take five bullets and place them under the crosses:
我们注意到,在出现的模式中,右边有两列余数。我们将这两列余数(每列包含一个项目符号)放在前两列下方,形成第三行:
We notice in the pattern that emerges that we have a remainder of two columns to the right. We take the two remainder columns, each comprising a single bullet, and put them under the first two columns, forming a third row:
现在我们注意到还有三列余数。我们取最右边的两列,并把它们放在最左边的两列下面:
Now we notice that we have a remainder of three columns. We take the rightmost two of them and put them under the two leftmost columns:
现在我们只剩下一列余数,所以我们停止。我们将这些列从左到右连接起来,得到:
Now we have only one remainder column, so we stop. We concatenate the columns from left to right and get:
这就是结果。我们把十字标记分布在了项目符号中。它们的间距不像以前那么均匀,但这是不可能的,因为记住,5不能整除12。不过,我们设法避免了把所有十字标记都堆在一起,并且创建了一个看起来并非完全杂乱无章的图案。
This is the result. We have distributed the crosses among the bullets. They are not as evenly spaced as before, but that is impossible to do because, remember, five does not divide evenly into 12. We have managed to avoid heaping all the crosses together, however, and have created a pattern that does not look entirely haphazard.
您可能想知道这种模式是否有什么特别之处;如果您用 DUM 代替十字,用 da 代替子弹,会更有帮助。然后模式变成 DUM-da-da-DUM-da-DUM-da-da-DUM-da-DUM-da,它真的是一种节奏。节奏由重音部分(也称为起始符)和非重音或静音部分组成。我们发现的节奏不是我们自己设计的节奏。中非共和国的阿卡俾格米人使用它;它是南非歌曲中称为 Venda 的拍手声;它也是巴尔干半岛马其顿使用的一种节奏模式。还有更多。如果我们旋转它,使它从第二个十字(即起始符)开始,那么它就变成:
You may wonder if there is anything particular about this pattern; it helps if you substitute DUM for the cross and da for the bullet. Then the pattern goes DUM-da-da-DUM-da-DUM-da-da-DUM-da-DUM-da and it really is a rhythm. A rhythm is constituted by accented parts, also called onsets, and unaccented or silent parts. The rhythm we found is not a rhythm of our own devising. It is used by the Aka pygmies in the Central African Republic; it is the clapping, called Venda, of a South African song; it is also a rhythm pattern used in Macedonia, in the Balkans. There is more. If we rotate it, so that it starts at the second cross (that is, onset), then it becomes:
这是哥伦比亚钟声模式,在古巴和西非很流行,也是肯尼亚的一种鼓声模式,在马其顿也有使用(再次)。如果我们把它旋转到从第三、第四和第五个音开始,世界各地其他流行的节奏就出现了。
That is the Columbia bell pattern, popular in Cuba and West Africa, as well as a drumming pattern in Kenya, while it is also used in Macedonia (again). If we rotate it to start on the third, fourth, and fifth onset, other popular rhythms around the world emerge.
这仅仅是一次性的事情吗?我们可以尝试用七个起音和五个静默部分创建一个12部分的节奏——有点像我们之前提到的五个起音和七个静默部分的镜像。如果我们遵循完全相同的步骤,我们将得到:
Is this just a one-off thing? We can try to create a 12-part rhythm out of seven onsets and five silent parts—kind of mirroring the five onsets and seven silent parts that we had before. If we follow exactly the same procedure, we will arrive at:
这又是一个节奏。加纳阿散蒂人的Mpre节奏中也用到了它。如果我们从最后一个声母开始,尼日利亚的约鲁巴人、中非人和塞拉利昂人也都用它。
This, again, is a rhythm. It is used in the Mpre rhythm of the Ashanti in Ghana, and if we start it on the last onset, it is used by the Yoruba in Nigeria as well as in Central Africa and Sierra Leone.
以免您认为我们有地理遗漏,如果我们从 5 个节拍和 11 个静音部分开始,我们会得出以下结论:
Lest you think we have geographic omissions, if we start with five beats and 11 silent parts, we arrive at the following:
这就是旋转的Bossa-Nova节奏。真正的Bossa-Nova节奏从第三个音符开始,所以确切的对应关系是:
That is the Bossa-Nova rhythm, rotated. The actual Bossa-Nova rhythm starts on the third onset, so the exact correspondence is:
如果我们尝试用三个节拍和四个静默部分,我们会得到以下模式:
If we try with three beats and four silent parts, we get the pattern:
这种七四拍子的节奏很流行,而且不仅仅出现在传统音乐中。在其他曲调中,Pink Floyd 的歌曲《Money》也采用了这种节奏模式:
This rhythm in a seven/four meter is popular, and not just in traditional music. Among other tunes, it is the rhythmic pattern of Pink Floyd’s song “Money”:
通过在列中放置十字和项目符号,并按照我们刚才描述的方式移动它们,可以衍生出更多的节奏。我们通过测量余数列来演示这个过程,但这实际上是一种图形化的方式,用来展示实际情况。我们不需要创建列、检查几何形状并移动它们,而是可以用简单的数值运算更正式地完成同样的操作。为了看清楚,让我们回到 12 个声部和 7 个声部的例子。我们首先用 12 除以 7,得到商 1 和余数 5:
Many more rhythms can be derived in this way by putting crosses and bullets in columns, and moving them around in the way we just described. We illustrated the procedure by measuring remainder columns, but this really is a pictorial way of showing what really happens. Instead of creating columns, checking the geometry, and moving them around, we can do the same thing more formally with simple numerical operations. To see what, let’s return to the example of 12 parts and seven onsets. We start by dividing 12 by 7, which gives us quotient 1 and remainder 5:
这告诉我们将七个声母放在开头,形成七列声母,然后是其余五个非重音部分:
This tells us to put the seven onsets in the beginning, creating seven columns of onsets, followed by a remainder of the five unaccented parts:
现在我们再次进行除法,但这次我们用前一次除法的除数 7 除以前一次除法的余数 5。这再次得出商为 1,而余数为 2:
Now we divide again, but this time we divide the divisor of the previous division, 7, by the remainder of the previous division, 5. This gives us a quotient of 1 again while the remainder is 2:
这意味着我们需要取出最右边的五列并将它们放在最左边的五列下面,余数为 2:
This means that we need to take the five rightmost columns and place them under the five leftmost columns, leaving a remainder of 2:
我们重复同样的步骤:用前一个除法的除数 5 除以上一个除法的余数 2。商为 2,余数为 1:
We repeat the same step: we divide the divisor of the previous division, 5, by the remainder of the previous division, 2. The quotient is 2 and the remainder is 1:
这告诉我们取最右边两列的两倍并将它们放在最左边两列的下面,余数为 1:
This tells us to take twice the two rightmost columns and place them under the two leftmost columns, leaving a remainder of 1:
注意,这里的“两倍”意味着如果我们按照之前的方式操作,不用除法,那么这相当于我们分两步完成的操作。我们可以这样操作:
Note that twice means that this is equivalent to what we would be doing in two steps if we had worked as we were doing before, without using the division. We would go from:
首先:
first to:
然后:
and then to:
如果我们连接这些列,我们就会得到 Mpre 节奏:
If we concatenate the columns, we get the Mpre rhythm:
我们可以用更精确的术语将我们遵循的方法写成以下步骤。假设我们从两个数字a和b开始。令a为音节的总数。如果起始音的数量大于静音音节的数量,则b 为起始音的数量。否则, b为静音音节的数量。首先,我们创建一行,其中起始音后接静音音节。
We can write down the method we followed in a bit more precise terms as the following steps. We assume that we start with two numbers, a and b. We let a be the total number of parts. If the number of onsets is greater than the number of the silent parts, then b is the number of onsets. Otherwise it is the number of the silent parts. At the beginning, we create a row with the onsets followed by the silent parts.
在这两个步骤中,我们重复执行除法,直到重复执行没有意义为止。您可以在下表中追踪我们执行的步骤,其中我们从 和 开始,
就像我们之前所做的那样;在每一行中,我们都有
:
In these two steps we perform a division repeatedly, until it does not make sense to repeat it. You can trace the steps we take in the following table, where we start with and , like we did before; in each row we have :
如果你检查表格,就能验证每一行对应着列的形成和移动的一个步骤,但我们对所使用的方法有更精确的定义。事实上,我们有一系列可以用纸笔执行的步骤,所以这就是我们的第一个算法!我们有一个算法,可以创建与许多(实际上数量惊人的)音乐节奏相对应的模式。使用不同数量的偏移量加上无声部分,我们可以得到大约40种节奏模式,这些模式在世界各地的不同节奏中都能找到。这应该让我们停下来思考一下:这是一个简单的算法(只有两步,重复),却能产生如此多有趣的结果。
If you examine the table, you can verify that each row corresponds to one step of the column formation and moving, but we have a more precise definition of the method we used. In fact, we have a series of steps that we can perform with pen and paper, so this is our first algorithm! We have an algorithm for creating patterns that correspond to many, and indeed surprisingly many, musical rhythms. Working with different numbers of offsets and silent parts, we can get about 40 rhythmic patterns that are found in different rhythms around the world. That should give us pause for a minute: it is a simple algorithm (only two steps, repeated) and yet able to produce so many interesting results.
但是,我们的算法能做的不止这些。既然我们正在讨论两个数字的除法,让我们考虑以下一般问题:如果我们有两个数字a和b,那么能整除它们的最大数字是多少?这称为这两个数字的最大公约数,或gcd。我们在初等算术中会遇到最大公约数,例如,如果我们有 12 包饼干和 4 包奶酪,你如何将它们分配到篮子里,以便每个篮子中的饼干和奶酪的比例相同?因为四能整除 12,所以你会有四个篮子,每个篮子包含三包饼干和一包奶酪;12 和四的最大公约数是四。如果你有 12 包饼干和八包奶酪,事情会变得更有趣。你不能用一个数除以另一个数,但是能同时整除 12 和 8 的最大数是四,这意味着你将再次制作四个篮子,每个篮子包含三包饼干和两包奶酪。
Our algorithm does more than that, though. As we are talking about the division of two numbers, let us consider the following general problem: If we have two numbers a and b, what is the greatest number that divides them both? This is called the greatest common divisor, or gcd, of the two numbers. We encounter the greatest common divisor in elementary arithmetic, in problems such as, If we have 12 packets of crackers and four packets of cheese, how will you distribute them in baskets so that you have the same proportion of crackers and cheese in each basket? As four divides 12, you will have four baskets, each containing three packets of crackers and one packet of cheese; the greatest common divisor of 12 and four is four. Things get more interesting if you have 12 packets of crackers and eight packets of cheese. You cannot divide one by the other, but the greatest number that divides both 12 and eight is four, which means that you will make again four baskets, each containing three packets of crackers and two packets of cheese.
那么,如何找到任意两个整数的最大公约数呢?我们已经知道,如果其中一个数能整除另一个数,那么这个数就是最大公约数。但如果这种情况没有发生,那么事实证明,为了找到两个数的最大公约数,我们只需要找到两个数相除余数与第二个数的最大公约数。用符号实际上更容易理解。如果我们有两个整数a和b ,那么a和b的最大公约数等于 a和b余数的最大公约数。这又回到了我们的“韵律”上。我们一直以来寻找“韵律”的方法实际上与我们用来寻找两个数之间最大公约数的方法相同。
So how can we find the greatest common divisor of any two integer numbers? We have seen that if one of the numbers divides the other, that is the greatest common divisor. But if that does not happen, then it turns out that in order to find the greatest common divisor of two numbers, we only need to find the greatest common divisor of the remainder of the division of the two numbers and the second number. This is actually easier to see with symbols. If we have two integers a and b, the greatest common divisor of a and b is equal to the greatest common divisor of the remainder of and b. This brings us back to our rhythms. The way we have been finding rhythms is in fact the same way we use to find the greatest common divisor between two numbers.
寻找两个数之间最大公约数的方法被称为欧几里得算法,以纪念古希腊数学家欧几里得,他在其著作《几何原本》(约公元前 300 年)中首次描述了这一算法。其基本思想是,如果我们用两个数中较大的数与较小的数之差代替较大的数,则两个数之间的最大公约数保持不变。以 56 和 24 为例,它们的最大公约数是 8,这也是56 和 24 的最大公约数;32 和 24 也是如此,以此类推。重复减法实际上是除法,因此欧几里得算法的步骤如下:
The way to find the greatest common divisor between two numbers is called Euclid’s algorithm, in honor of Euclid, an ancient Greek mathematician who first described it in his books Elements (ca. 300 BCE). The basic idea is that the greatest common divisor between two numbers remains the same if we replace the larger number of the two with its difference with the smaller number. Take 56 and 24. Their greatest common divisor is 8, which is also the greatest common divisor of and 24, and the same goes for 32 and 24, and so on. Repeated subtraction is really division, so Euclid’s algorithm is described with the following steps:
这些步骤与之前基本相同。唯一的区别在于,在寻找韵律时,在步骤 2 中,当余数为 0 或 1 时我们停止,而欧几里得算法则在余数为 0 时停止。这实际上是相同的:如果余数为 1,那么在下一次重复步骤 1 时,会得到 0 余数,因为 1 可以整除所有整数。试算 9 和 5:,因此我们转到
,然后
,因此 9 和 5 的最大公约数是 1。
These are essentially the same steps as before. The only difference is that when finding rhythms, in step 2 we stop when the remainder is 0 or 1, while Euclid’s algorithm stops when the remainder is 0. This is really the same: if you have a remainder of 1, then in the next repetition of step 1, you get a 0 remainder because 1 divides every integer. Try 9 and 5: , so we go to and then , so the greatest common divisor of 9 and 5 is 1.
下表中的和或许能帮助你理解该算法的实际作用
,它与我们之前在节奏练习中看到的类似。我们发现 136 和 56 的最大公约数是 8:
It may help you to see the algorithm in action with and in the following table, similar to the one we saw before with our rhythms. We find that the greatest common divisor of 136 and 56 is the number 8:
正如我们在 9 和 5 中提到的那样,欧几里得算法在所有情况下都能正确运行,即使这两个数字不相等除了 1 之外,还有其他公约数。这就是和发生的情况。你可以亲眼看看,如果尝试对和
执行算法步骤会发生什么;虽然需要几个步骤,但算法会确定唯一的公约数是 1。
As we noted with 9 and 5, Euclid’s algorithm works correctly in all cases, even when the two numbers do not have any common divisor apart from 1. This is what happened with and . You can see for yourself what happens if you try to perform the algorithm’s steps with and ; it will take a few steps, but the algorithm will determine that the only common divisor is 1.
欧几里得算法的各个步骤按照明确定义的顺序执行。该算法的描述说明了其各个步骤的组合方式:
The steps in Euclid’s algorithm are performed in a well-defined order. The description of the algorithm illustrates the way its component steps are combined:
我们将这三种组合步骤的方式称为控制结构,因为它们决定了算法执行时将执行哪些操作。所有算法都以这种方式构建。它们包含执行计算和处理数据的步骤;这些步骤使用这三种控制结构组合在一起并进行编排。更复杂的算法包含更多步骤,并且它们的编排可能更复杂。但这三种控制结构足以描述任何算法的各个步骤应该如何组合在一起。
We call these three ways to combine steps control structures because they dictate which action will be performed as we carry out the algorithm. All algorithms are structured in this way. They comprise steps doing calculations and processing data; these steps are assembled together and choreographed using these three control structures. More complex algorithms have more steps, and their choreography may be more complex. But the three control structures suffice to describe the way the steps of any algorithm should be put together.
算法的各个步骤都会对我们提供的输入进行操作。输入是指算法处理的数据。如果我们采用以数据为中心的观点,我们会使用算法将描述问题的数据转换为与问题解决方案相对应的形式。
The steps of an algorithm will, among other things, operate on the input we provide. The input is the data that are processed by the algorithm. If we adopt a data-centric view, we use an algorithm to transform some data, which describe a problem, to some form that corresponds to the problem’s solution.
我们发现音乐节奏背后有一种算法,它是除法的一种应用,但实际上,我们不必研究那么远;除法本身就是一种算法。即使你没听说过欧几里得算法,你也知道如何除两个大数;我们在早年都花了很多时间学习如何进行长乘法和长除法。我们的老师花了几个小时向我们灌输如何进行这些运算:一套将数字放在正确位置并用它们做事的步骤——它们就是算法。但正如我们刚才看到的,算法不仅仅是关于数字。我们刚刚发现,它们与我们如何创作音乐有关。然而,这并不神秘。节奏是一种在时间间隔内分配压力的方式,当我们包装饼干和奶酪时,同样的原理也在起作用。
We found an algorithm behind musical rhythms that is an application of division, but in reality, we need not look that far; the act of division itself is an algorithm. Even if you have not heard of Euclid’s algorithm, you know how to divide two large numbers; we have all spent time in our early years learning to perform long multiplication and long division. Our teachers spent hours drilling into our heads how to perform these operations: a set of steps for putting numbers in the right places and doing things with them—they are algorithms. But algorithms are not simply about numbers, as we have just seen. We just found that they are about how we can produce music. Yet there is nothing mystifying about that. A rhythm is a way to distribute stresses in a time interval, and the same principle is at work when we pack crackers and cheese.
欧几里得算法在韵律上的应用有一个不太可能的来源:田纳西州橡树岭国家实验室的中子源设施。散裂中子源 (SNS) 产生强脉冲中子束,用于粒子物理实验。(动词“碎裂”的意思是将材料破碎成更小的碎片;在核物理学中,我们用重原子核在被高能粒子轰击后发射出大量质子和中子来描述。)在 SNS 的运行中,一些组件(例如高压电源)应尽可能均匀地运行,以使脉冲在时隙中分布均匀。设计用于分配的算法本质上与节奏算法和欧几里得算法相同,将我们从数字带到亚原子粒子,再到音乐。3
The application of Euclid’s algorithm to rhythms had an unlikely source: a neutron source facility in the Oak Ridge National Laboratory in Tennessee. The Spallation Neutron Source (SNS) there produces intense pulsed neutron beams that are used in experiments in particle physics. (The verb to spall means breaking a material into smaller pieces; in nuclear physics, we have a heavy nucleus emitting a large number of protons and neutrons after being bombarded with a high-energy particle.) In the operation of the SNS, some components, such as high-voltage power supplies, should run so that pulses are distributed in timing slots as evenly as possible. An algorithm devised to do the distribution is essentially the same as the rhythm-making algorithm and Euclid’s algorithm, taking us from numbers to subatomic particles to music.3
我们说过算法与计算机无关,但如今大多数人却将它们捆绑在一起。诚然,算法与计算机结合时才能展现其潜力,但计算机本质上是一种机器,拥有我们可以命令它执行某些操作的特殊能力。我们通过编程来命令它,通常我们编程让它执行算法。
We said that algorithms are not about computers, yet today most people bundle them together. It is true that algorithms show their potential when they are coupled with computers, but a computer is really a machine with the special trait that we can order it to do certain things. We order it by programming it, and usually we program it to execute algorithms.
这引出了编程本身。编程是将我们的意图转化为计算机能够理解的符号的学科。我们称这种符号为编程语言,因为有时看起来我们确实在用人类语言写作,但与人类语言的丰富性和复杂性相比,编程语言是相当简单的。当然,现在计算机实际上什么都不懂。如果我们能够制造出真正智能的机器,未来情况可能会有所改变。但现在,当我们说计算机理解一种符号时,实际上意味着符号被转换成一系列用于操纵电子电路中电流的指令(我们也可能用光代替电流,但原理是一样的)。
Which brings us to programming itself. Programming is the discipline of translating our intentions to some notation that a computer is able to understand. We call this notation a programming language because sometimes it does look like we are writing in a human language, but programming languages are fairly simple affairs compared to the richness and complexity of human languages. Now, of course, a computer does not really understand anything. Things may change in the future, if we are able to produce truly intelligent machines, but right now when we say that a computer understands a notation, it really means that the notation is converted to a series of instructions for manipulating current in electronic circuits (we may also use light instead of electric current, yet the idea is the same).
编程是将我们的意图转化为计算机能够理解的符号的学科。我们称这种符号为编程语言。
Programming is the discipline of translating our intentions to some notation that a computer is able to understand. We call this notation a programming language.
如果说算法是一组我们可以自己执行的步骤,那么编程就是我们用计算机能理解的符号写下这些步骤的活动。然后由计算机来执行这些步骤。计算机比人类快得多,所以它们可以在更短的时间内执行这些步骤。计算的基本因素是速度。计算机无法做到与人类在本质上不同的事情,但它可以做得更快——快得多。算法在计算机上获得强大功能,因为它可以在计算机上执行相同步骤所需时间的一小部分内完成,但这些步骤仍然是相同的。
If an algorithm is a set of steps we can carry out ourselves, programming is the activity by which we write down the steps in the notation that the computer understands. Then it is the computer that will carry them out. Computers are much faster than human beings, so they can execute the steps in less time. The fundamental factor in computing is speed. A computer cannot do something qualitatively different from what we humans can do, but it can do it faster—a lot faster. An algorithm gains power on a computer because it can be executed there in a fraction of the time it would take us to perform the same steps, but they are still the same steps.
编程语言为我们提供了一种向计算机描述算法步骤的方法。它还提供了使用三种基本控制结构(序列、选择和迭代)来构建算法的方法。我们写出这些步骤并描述如何使用我们正在使用的特定编程语言提供的词汇和语法来编排它们。
A programming language gives us a way to describe to a computer the steps of algorithms. It also provides the means to structure them using the three fundamental control structures: sequence, selection, and iteration. We write the steps and describe how they are choreographed using the vocabulary and syntax provided by the particular programming language we are using.
如果算法是一组我们可以自己执行的步骤,那么编程就是我们用计算机可以理解的符号写下这些步骤的活动。
If an algorithm is a set of steps we can carry out ourselves, programming is the activity by which we write down the steps in the notation that the computer understands.
除了速度快之外,使用计算机还有一个额外的优势:如果你还记得自己是如何学习执行长乘法和除法的,那可能要花很多时间练习,而且可能并不那么令人兴奋。正如我们上面提到的,这些东西在我们很小的时候就被灌输到脑子里,而灌输到脑子里可不是什么愉快的过程。计算机不会感到无聊,所以让它们执行算法的另一个原因是,它能消除我们的无聊,让我们有时间去做更有趣的事情。
There is an additional advantage to using computers apart from speed; if you can recall how you learned to perform long multiplication and division, it may have taken a lot of practice, and may not have been that exciting. As we noted above, these things are drilled into our heads at an early age, and drilling inside a head is not a pleasant procedure. Computers do not suffer from boredom, so an added reason to have them perform algorithms is to take out the tedium and leave us time to do more interesting things.
虽然算法通常在计算机上执行,但在用编程语言编写之后,它主要是为了人类而写的,人类必须理解算法的工作原理以及何时可以使用它。这就引出了即使是经验丰富的计算机科学家和经验丰富的程序员也会忘记的一个关键点。真正理解算法的唯一方法是手动执行。我们必须能够执行算法,就像计算机执行实现算法的程序一样。如今,我们有幸拥有种类繁多的媒体来帮助我们学习:只需点击一下即可观看精彩的视频、动画和模拟。所有这些都很棒,但当你遇到困难时,请准备好你的笔和便笺簿。这句话同样适用于这些线条。你真的理解如何创造节奏吗?你试过创建一个吗?你能找到 252 和 24 的最大公约数吗?
Although an algorithm is usually executed on a computer, after being written in a programming language, it is primarily written for humans, who must understand how it works and when it can be used. This brings us to something essential that even experienced computer scientists and seasoned programmers forget. The only way to truly understand an algorithm is to perform it by hand. We must be able to execute the algorithm, in the same way the computer would execute a program that implements it. At this date and time, we are privileged to have at our disposal an amazing array of media that can help us learn: superb videos, animations, and simulations are one click away. All these are great, but when you are stuck, have your pen and pad nearby. The same applies to these very lines. Have you really understood how you can create rhythms? Did you try to create one? Can you find the greatest common divisor of 252 and 24?
所有程序都会实现一系列步骤来完成某件事,所以我们可能会说所有程序都是算法。然而,我们要求更严格一些,希望我们的步骤满足以下几个特征:4
All programs implement a set of steps to do something, so we could be tempted to say that all programs are algorithms. We are a bit stricter, however, and want our steps to meet certain characteristics:4
这些特性确保算法能够完成某些任务。算法之所以存在,是因为它能够完成一些有用的任务。一些看似无关紧要的算法确实存在,计算机科学家也可能出于玩笑或失误而发明一些毫无用处的算法,但我们真正感兴趣的是那些对我们有用的算法。在使用算法时,仅仅证明算法可以完成某些任务是不够的。我们希望算法具有实用价值,为此,它们必须能够出色地完成某些任务。
These characteristics ensure that the algorithm does something. An algorithm exists because it does something useful. Frivolous algorithms do exist, and computer scientists may invent useless algorithms either in jest or by mistake, but we are really interested in algorithms that have some utility to us. When working with algorithms, it is not enough to show that something can be done. We want algorithms to be of practical interest, and for that purpose they must do something well.
这就是算法和数学之间的根本区别。早期的计算机科学家大多是数学家,计算机科学也大量运用数学,但它并非一门数学学科。数学家想要证明某事成立,而计算机科学家则想要让它发挥作用。
Therein lies a fundamental difference between algorithms and mathematics. Most early computer scientists were mathematicians, and computer science uses a lot of mathematics, but it is not a mathematical discipline. A mathematician wants to prove that something is so; a computer scientist wants to make it work.
算法的首要特征是它应该需要有限数量的步骤。这并不十分精确。我们不希望只有有限数量的步骤。我们希望步骤数量足够少,以便在实践中执行,从而确保我们的算法能够在合理的时间内完成。这意味着仅仅提出一个算法是不够的;该算法还必须在实践中有效。让我们看一个例子来说明了解某件事和知道如何高效地做某件事之间的区别。假设我们有一个如下的网格:
Our first characteristic of an algorithm is that it should require a finite number of steps. That is not very precise. We do not want to have just a finite number of steps. We want to have a number of steps that is small enough to execute them in practice, so that our algorithm finishes in a reasonable amount of time. That means that coming up with an algorithm is not enough; the algorithm must also be effective in practice. Let’s see an example to illustrate the difference between knowing something and knowing how to do something efficiently. Imagine we have a grid like the following:
我们希望找到从网格左上角到右下角的最短路径,且路径中不会重复访问同一个点。每条路径的长度等于网格上点之间的连接数。这里有一种方法:找到所有这样的路径,测量每条路径的长度,然后选取最短的一条,如果长度相同,则取其中任意一条最短的路径。路径总数为 12 条,如下所示:
We want to find the shortest path from the upper-left corner of the grid to the lower-right corner, without visiting the same place twice. The length of each path is equal to the number of links between points on the grid. Here is one way to do it: find all such paths, measure how long each of them is, and pick up the shortest, or any of the shortest in case of ties. The total number of paths is 12, which you can see below:
有五条长度为 4 的路径,因此我们可以选择其中任何一条。
There are five paths of length 4, so we can pick any one of them.
不过,我们并不局限于网格。我们可以拥有
、
,甚至更大的网格。然后我们发现,我们的方法扩展性不佳。从网格的左上角到右下角有 184 条路径
;如果我们扩展到
网格本身,这样的路径数量就会增加到 8,512 条。路径的数量还在持续快速增长——事实上,增长速度越来越快——并且甚至数出这样的路径都是一件难事。当我们到达一个
网格时,我们会得到 8,402,974,857,881,133,471,007,083,745,436,809,127,296,054,293,775,383,549,824,742,623,937,028,497,898,215,256,929,178,577,083,970,960,121,625,602,506,027,316,549,718,402,106,494,049,978,375,604,247,408 条路径。这个数字有 151 位十进制数字,是通过一个执行算法的计算机程序得出的;是的,我们使用一种算法来理解另一种算法的行为。5
We are not limited to grids, though. We can have , , and even larger grids. Then we discover that our method does not scale well. There are 184 paths from the upper-left corner to the bottom-right corner of a grid; if we go to the grid, the number of such paths increases to 8,512. The number of paths continues to increase apace—in fact, at ever larger paces—and even counting such paths is a challenge. When we reach a grid, we get 8 402 974 857 881 133 471 007 083 745 436 809 127 296 054 293 775 383 549 824 742 623 937 028 497 898 215 256 929 178 577 083 970 960 121 625 602 506 027 316 549 718 402 106 494 049 978 375 604 247 408 paths. This number has 151 decimal digits and was found with a computer program implementing an algorithm; yes, we use an algorithm to understand the behavior of another algorithm.5
枚举所有路径并挑选最短路径的过程无疑是正确的,并且总能找到最短路径——或者,如果存在多条同样短的路径,则可以找到任意一条最短路径——但这显然是不切实际的。而且,它完全没用,因为有些算法无需枚举所有可能的路径就能找到最短路径,从而节省大量时间,并使我们能够处理任何大小的网格。在网格中,找到答案所需的步骤只有数百个;我们将在下一章中看到。
The procedure for enumerating all paths and picking the shortest one is undoubtedly correct, and will always give us the shortest path—or any of the shortest paths, if there are many equally short ones—yet it is definitely impractical. Also, it is completely useless, as there are algorithms that will find the shortest path without having to enumerate all possible paths, thus saving a lot of time and allowing us to tackle grids of any size. In the grid, the number of steps required to find the answer is only in the order of the hundreds; we’ll see it in the next chapter.
什么是实用算法,以及在何种意义上一种算法比其他算法更实用,这些问题是算法应用的核心。在本书的其余部分,我们将看到,解决同一问题通常存在不同的算法,我们会选择最适合特定应用场景的算法。与所有工具一样,某些算法比其他算法更适合特定情况。与许多其他工具不同,不过,我们拥有明确的方法来评估算法的优点。
The question of what is a practical algorithm and in what sense an algorithm is more practical than others is at the heart of any application of them. We’ll see in the rest of the book that there often exist different algorithms for solving the same problem and we choose the algorithm that is most appropriate for the application at each particular setting. Like all tools, some algorithms are more suitable for particular cases than others. Unlike many other tools, though, we possess a well-defined way to evaluate the merits of algorithms.
当我们研究一种算法来解决问题时,我们想知道它的性能如何。速度始终是一个重要因素。我们使用计算机算法来比人类更快地完成任务。
When we are investigating an algorithm to solve a problem, we want to know how it is going to perform. Speed is always an important factor. We use algorithms on computers to do things faster than a human would do.
随着计算机硬件的改进,我们通常不再满足于了解实现算法的程序在特定计算机上的运行情况。我们的计算机可能比测试该算法的计算机更快或更慢,而且几年后,在过时的机器上对算法的测试结果将只具有历史意义。我们需要一种独立于计算机硬件来衡量算法性能的方法。
As computer hardware improves, we are usually not content with knowing how a program implementing an algorithm runs on a particular computer. Our computer may be faster or slower than the one that the algorithm was measured on, and after some years, measurements of algorithms on outdated machines will have only historical interest. We need a way to measure how well an algorithm performs independent of computer hardware.
然而,我们试图解决的问题的规模应该在某种程度上反映在我们衡量算法性能的方式上。我们并不真正关心排序 10 个项目需要多长时间;毕竟,我们可以手动完成。我们关心的是排序一百万个或更多项目需要多长时间。我们希望衡量一个算法在处理复杂问题时的表现。
The size of the problem we are trying to solve, though, should be somehow reflected in how we measure the performance of an algorithm. We don’t really care how long it takes to sort 10 items; after all, we can do that by hand. We care how long it takes to sort a million items or more. We want a measure of how we expect an algorithm to perform in problems that are not trivial.
为此,我们需要一种方法来量化算法中问题的规模。感兴趣的维度各不相同在不同问题之间。如果我们想在计算机中对一些项目进行排序,相关的维度就是我们要排序的项目的数量(而不是项目的大小或组成)。如果我们想将两个数字相乘,相关的维度就是这两个数字的位数(这对人类来说也合情合理:长乘法之所以长,是因为它取决于每个数字的位数)。当我们研究一个问题以及解决该问题的候选算法时,我们总是会考虑问题的规模。
To do that, we need a way to quantify the size of problems fed to algorithms. The dimension of interest varies among different problems. If we want to sort a number of items in our computer, the relevant dimension is the number of items that we want to sort (and not, say, the size or composition of the items). If we want to multiply two numbers, the relevant dimension is the number of digits of the two numbers (that also makes sense for humans: long multiplication is long because it depends on how many digits each number has). When we study a problem and candidate algorithms for tackling it, we do it always with the size of the problem under consideration.
虽然具体问题有不同的规模评估方法,但最终,我们会用一个整数n来指定每个问题的大小。继续上面的例子,n要么是待排序项的个数,要么是待乘数的位数。这样,我们就能讨论处理规模为n的算法的性能了。
Although particular problems have different ways to assess their size, in the end, for each problem we specify its size with an integer number, which we call n. Picking up the examples above, n is the number of either the items to sort or digits of the numbers we want to multiply. Then we want to be able to talk about the performance of algorithms tackling problems of size n.
算法所需的时间与其计算复杂度相关。算法的计算复杂度是指运行算法所需的资源量。这里主要有两种资源:时间(运行所需的时间)和空间(即需要多少计算机内存)。
The time required by an algorithm is related to its computational complexity. The computational complexity of an algorithm is the amount of resources it requires to run. There are two main kinds of resources here: time, how long it takes, and space, how much storage it requires in terms of computer memory.
我们现在关注的是时间。由于不同的计算机性能特征不同,讨论某个算法在特定计算机上运行所需的时间或许能让我们预判一下预期结果。在其他计算机上运行时,我们想要更通用的描述。计算机的速度取决于执行基本操作所需的时间。为了避免这种特殊性,我们选择讨论运行算法所需的操作数量,而不是在特定计算机上运行这些操作的实际时间。
We are focusing on time right now. As there are computers with different performance characteristics, talking about the time taken by an algorithm to run on a particular computer may give us some indication of what to expect when it runs on other computers, but we would like something more general. The speed of a computer depends on the time it takes to execute basic operations. To get around such specificities, we instead choose to talk about the number of operations required to run an algorithm, not the actual time it takes on a specific computer to run these operations.
话虽如此,请注意,我们可能有些滥用术语,将“操作”和“时间”视为同义词。虽然严格来说,我们应该说某个算法需要“ x 次操作”,但我们也会说该算法的“时间x ”,以表明它在实际运行该算法的任何计算机上的运行时间等于执行x 次操作所需的时间。即使实际时间会因硬件而异,但当我们想要比较在同一台计算机上分别在“时间x ”和“时间y ”运行的两个算法时,无论这台计算机是哪台,这都无关紧要。
Now having said that, note that we’ll be abusing terminology a bit and treating “operations” and “time” as synonyms. Although we should be strictly saying that an algorithm requires “x operations,” we’ll also be saying that the algorithm is “time x,” to indicate that it runs in the time required to execute x operations on any computer that the algorithm is actually run. Even though the actual time will vary with different hardware, it does not matter when we want to compare two algorithms that run on “time x” and “time y” on the same computer, whatever computer that is.
现在我们回到算法所给定问题的规模。由于我们感兴趣的是非平凡问题,所以我们不会关心小规模问题时会发生什么。我们关心的是达到一定规模后会发生什么。我们不会确切地说明这个规模是多少,但我们始终假设它是相当大的。
Now we return to the size of the problem given to an algorithm. As we are interested in nontrivial problems, we won’t care about what happens with small problem sizes. We will be concerned with what happens once we reach a certain size. We won’t say exactly what this size is, but we will always assume that it is substantial.
有一个复杂性的定义,在实践中被证明是有用的。它也有一个符号和名称。我们称之为大O符号。在大 O 符号中,在点的位置,我们写一个表达式。这个符号表示该算法所需的时间最多是表达式的倍数。让我们看看这意味着什么:
There is a definition of complexity that has proved to be useful in practice. It also has a symbol and name. We write and call it the big O notation. Inside the big O, in the place of the dot, we write an expression. The notation means that the algorithm will take time that is at most a multiple of the expression. Let us see what that means:
如果我们有一个具有复杂度的算法,那么对于 10,000 的输入规模,我们预计它需要数万个步骤。如果该算法具有
复杂度,对于类似规模的输入,我们预计它需要一亿个步骤。对于许多问题来说,这个规模并不算大。计算机通常对 10,000 个项目进行排序。但你会发现,算法复杂度所代表的步骤数量规模可能会非常大。
If we have an algorithm that has complexity, then for an input size of 10,000 we expect it to need a multiple of ten thousand steps. If the algorithm has complexity, for a similarly sized input, we expect it to need a hundred million steps. For many problems, this is not a large size. Computers routinely sort 10,000 items. But you see that the scale of the number of steps represented by the algorithm’s complexity can grow large.
以下示例或许能帮助你理解我们将遇到的一些数字的大小。1000亿,也就是1后面有11个零的数字。如果你拿1000亿个汉堡,把它们首尾相连,就能绕地球216圈,还能去月球,然后回来。
Here are some examples that may help you appreciate the size of some numbers that we will encounter. Take the number 100 billion, or ; this is one with 11 zeros behind it. If you take 100 billion hamburgers and lay them end to end, you can circle the earth 216 times, go to the moon, and come back.
十亿通常被称为千兆,至少在计算机中是这样。十亿(或千兆)之后是万亿(或太拉),即1万亿。如果以每秒一个数字的速度计算,则需要31000年才能达到一万亿。再增加1000,就达到了千万亿(
或千万亿)的数量;根据生物学家E.O.威尔逊的说法,地球上蚂蚁的总数在1到10千万亿之间。换句话说,地球上有1到10千万亿只蚂蚁。
A billion of something is usually called giga something, at least in computers. Next after the billion, or giga, comes the trillion, or tera, which is 1,000 billion, . If you start counting one number per second, you will need 31,000 years to get to one trillion. Up by 1,000 again and we get to one quadrillion, , or peta; the total number of ants that live on the earth is between 1 and 10 quadrillion, according to biologist E. O. Wilson. In other words, we have between 1 and 10 petaants on our planet.
千万亿之后是五千万亿,或称艾克萨;一千万亿是,大约是 10 个大海滩的沙粒数量。例如,10 个科帕卡巴纳海滩有 1 艾克萨格伦沙粒。再往上,我们得到了
,即一千万亿,或称泽塔。可观测宇宙中的恒星数量是 1 泽塔星。在yotta之后我们已经找不到前缀了,它代表
,一千万亿。但数字总是可以越来越大。这个数字
叫做googol——是的,你可能知道一家公司就是用故意拼写错误的名字来命名的。然后是 10 的 googol 次方,
即 ,也就是1 googolplex。6
After quadrillion comes quintillion, or exa; a quintillion is and is about the number of grains of sand in 10 large beaches. For example, 10 Copacabana Beaches have one exagrain of sand. Up again, we arrive at , one sextillion, or zetta. The number of stars in the observable universe is one zettastars. We run out of prefixes after yotta, which stands for , one septillion. But numbers can always get larger. The number is called a googol—yes, you probably know a company that has named itself after a purposeful misspelling. And then there is 10 raised to the googol power, , which is one googolplex.6
这些类比将有助于我们理解本书其余部分将要讨论的具体算法的相对优点。虽然理论上我们可以任何类型的复杂算法,我们通常处理的算法都分为几类不同的类型。
These analogies will help us appreciate the relative merits of specific algorithms that we will examine in the rest of the book. Although in theory we could have algorithms of any kind of complexity, the algorithms we usually deal with fall into few different groups.
所有算法中最快的一类是那些无论输入如何,运行时间都不超过常数级的算法。我们用 表示这种复杂度;例如,一个检查数字最后一位是奇数还是偶数的算法,不会受到数字大小的影响,并且运行时间是常数级。 中的 1
表示
该算法运行步骤不超过 1 的倍数,即步骤数为常数级。
The fastest family of all algorithms comprises the algorithms that run in no more than constant time, no matter what their input. We denote this complexity with ; for example, an algorithm that checks if the last digit of a number is odd or even will not be affected by the size of the number and will run in constant time. The 1 in follows from the fact that means that the algorithm needs no more than a multiple of one steps to run—that is, a constant number of steps.
在了解下一个复杂性家族之前,我们需要简要了解一下事物增长或收缩的一种特殊方式。如果将某项加法多次,就相当于将其乘以它。如果将某项乘以多次,就相当于将其乘以幂或指数。我们刚刚看到了像(或更多)这样的指数可以有多大。或许不太明显的是,指数增长会以多快的速度导致令人眼花缭乱的增长——这种现象被称为指数增长。
Before we meet the next complexity family, we need to take a brief excursion into a particular way things can grow or shrink. If you add something many times, you multiply it. If you multiply something many times, you raise it to a power or exponentiate it. We just saw how big numbers with exponents like (or more) can get. What is perhaps not immediately obvious is how quickly exponentiation leads to dizzying escalation—a phenomenon called exponential growth.
关于国际象棋发明的一个可能是杜撰的故事很能说明问题。国际象棋发明国的统治者问它的发明者,他希望一份礼物(可惜,这些故事里都是“他”)。他回答说,他希望在棋盘的第一个方格上放一粒米,第二个方格上放两粒,第三个方格上放四粒,以此类推。国王以为自己轻松脱身,满足了他的愿望。可惜,事情很快就变糟了。这个数列以2的幂次方增长:第一个方格,
第二个方格,
第三个方格,因此最后一个方格中的米粒数量将是
,这是一个无论如何都无法达到的数量(它等于9,223,372,036,854,775,808,约等于9千万亿)。
The probably apocryphal story about the invention of chess is illustrative. The ruler of the country where chess was invented asked its inventor what he would like for a gift (alas, it is a “he” in these stories). He replied that he would like one grain of rice on the first square of the chessboard, two on the second, four on the third, and so on. The king thought that he got off easily and granted the wish. Unfortunately, things quickly turned sour. The sequence grows in powers of two: in the first square, in the second square, in the third square, and thus in the last square the number of grains would be , a quantity unreachable by any means (it is equal to 9,223,372,036,854,775,808, or about 9 quintillion).
指数增长也能帮助我们理解为什么将一张纸折叠多次如此困难。每次折叠,折叠纸的层数都会翻倍。折叠10次后,纸张就会变得厚厚的。如果纸张厚度为0.1毫米,那么现在折叠起来的纸团厚度将超过10厘米。除了将其对折所需的巨大力量之外,从物理角度来看,这几乎是不可能的,因为要折叠某样东西,它的长度必须大于厚度。7
Exponential growth can also help us understand why it is so difficult to fold a piece of paper many times. Each time you fold it, you double the number of layers of the folded paper. After 10 folds, you have layers. If your sheet is 0.1 millimeters thick, you now have a folded wad that is over 10 centimeters thick. Apart from the sheer force you will need to fold that in two, it may not be physically possible at all to do it, because to fold something it must be longer than thick.7
指数级增长是计算机性能逐年提升的原因。根据摩尔定律,集成电路中晶体管的数量大约每两年翻一番。该定律以创立仙童半导体公司和英特尔公司的戈登·摩尔的名字命名。他于 1965 年提出了这一观察;事实证明,该定律具有先见之明,因此,从 1971 年处理器中大约 2,000 个晶体管(英特尔4004)到 2017 年的 190 多亿(32 核 AMD Epyc)。8
Exponential growth is the reason why computers have gotten more and more powerful over the years. According to Moore’s law, the number of transistors in an integrated circuit doubles about every two years. The law is named after Gordon Moore, who founded Fairchild Semiconductor and Intel. He made the observation in 1965; the law proved prescient, so that we have gone from about 2,000 transistors in a processor in 1971 (the Intel 4004) to more than 19 billion in 2017 (the 32-core AMD Epyc).8
了解了增长之后,让我们来探索它的反面。如果某个数是倍数,可以用除法来逆转运算,得到原始值。如果某个数是幂,那么该如何逆转运算呢?幂的逆运算就是对数。
Having seen growth, let us explore now its opposite. If you have a multiple of something, you use division to reverse the operation and get the original value. If you have the power of something, , how do you reverse the operation? The reverse of raising to a power is the logarithm.
对数有时被认为是通俗数学和入门数学之间的分界线;甚至连它的名字都带着一丝难以理解的意味。如果对数看起来有些模糊,你需要记住,对数的运算是将数乘以幂的逆运算。就像我们乘幂时要反复乘一样,对数的运算也是反复除以。
Logarithms are sometimes taken as the boundary between mathematics for all and mathematics for the initiated; even the name has an aura of incomprehension. If logarithms seem somewhat hazy, you need to keep in mind that the logarithm of a number is the reverse of raising the number to a power. Just as when we raise to a power, we multiply repeatedly, when we take a logarithm, we divide repeatedly.
对数是对“我应该将一个数取几次方才能得到我想要的值?”这个问题的答案。我们要取的数称为对数的底数。所以,如果问题是“我应该将 10 取几次方才能得到 1000?”,答案是 3,因为。当然,我们可能想要取不同的数,也就是说,使用不同的底数。对数的符号是,它对应于这个问题“我应该将a
取几次方才能得到x?”。当 时,我们只需去掉下标,因为以 为底的对数很常见,所以我们不写 ,而直接写。
The logarithm is the answer to the question, “To which power should I raise a number to get the value I want?” The number we are raising is called the base of the logarithm. So if the question is, “To which power should I raise 10 to get 1000?,” the answer is 3 because . Of course, we may want to raise a different number—that is, use a different base. The notation for logarithms is and it corresponds to the question, “To which power should I raise a to get x?” When , we just drop the subscript, because logarithms base 10 are common, so instead of writing we simply write .
还有另外两个常见的底数。当底数是数学常数e时,我们写为。数学常数e,称为欧拉常数,约等于 2.71828。在自然科学中我们
经常遇到,这就是它被称为自然对数的原因。另一个常见的底数是 2,
我们不写 ,而写为
。以 2 为底的对数在计算机科学和算法中很常见,但在这些领域之外可能并不常用,尽管我们已经见过它们。在折纸中,如果一叠纸有 1,024 层,则它已被折叠了
多次。在国际象棋的例子中,米粒的数量来自我们执行的加倍次数,即
。
There are also two other common bases. When the base is the mathematical constant e, we write . The mathematical constant e, called Euler’s number, is approximately equal to 2.71828. In the natural sciences we meet a lot, which is why it is called natural logarithm. The other common base is 2, and instead of writing we write . Base 2 logarithms are common in computer science and algorithms, but probably unused outside these fields, although we have already met them. In paper folding, if a wad of paper has 1,024 layers, it has been folded times. In the chess example, the number of grains of rice results from the number of doublings we perform, which are .
我们在算法中经常看到“复杂度”的原因是,每当我们通过将一个问题拆分成两个大小相等的小问题来解决时,它就会出现;这被称为“分而治之”,其原理就像将一张纸对折。在已排序的一组项目中搜索某个内容的最有效方法是“复杂度”
。这相当惊人;这意味着,要在十亿个有序的项目中查找某个内容,你只需要
对你的条目进行探测。
The reason we see a lot in algorithms is that it appears whenever we solve a problem by splitting it in two equal smaller problems; this is called divide and conquer, and it works like folding a sheet in two. The most efficient way to search for something in a sorted group of items has complexity . That is pretty amazing; it entails that to find something among one billion ordered items, you need only probes into your items.
具有对数复杂度的算法是继常数时间算法之后的最佳选择。接下来是运行时间的算法,它们被称为线性时间算法,因为它们的时间与n成比例增长;这意味着它们的增长速度是n的倍数。我们看到,在无序集合所需的时间与集合中元素的数量成正比
。对比有序集合,复杂度增加了多少;组织问题的数据会对问题的解决方式产生重大影响。一般来说,如果算法必须读取问题的所有输入,线性时间是我们可以预期的最佳行为,因为这将需要读取n 个输入
的时间。
Algorithms that have logarithmic complexity are the next best thing after algorithms that run in constant time. Next come algorithms that run in , which are called linear time algorithms because their time grows proportionally with n; that means that they grow as multiples of n. We saw that searching for an item in an unordered set of items requires time proportional to the number of the items, . See how the complexity increased compared to when the items are ordered; organizing the data of our problem can have a big impact on how it can be solved. In general, linear time is the best behavior we can expect of an algorithm if it has to read through all the inputs of the problem, as this will require time for n inputs.
如果我们将线性和对数时间结合起来,我们就得到了对数线性时间算法,它们的时间增长了n乘以其对数,即。最好的排序算法——即将项目按顺序排列——的复杂度为
。这可能看起来有点令人惊讶;毕竟,可以证明,如果你有n 个项目,并且想要将每个项目与所有其他项目进行比较,则需要时间
,这大于
。9另外,如果你有n 个项目需要排序,那么你肯定需要
时间来检查所有项目。对它们进行排序需要将该数字乘以一个小于n本身的因子。我们将在本书的后面看到如何做到这一点。
If we combine linear and logarithmic times, we get loglinear time algorithms, where their time grows by n multiplied by its logarithm, . The best algorithms for sorting—that is, putting items in order—have complexity . That may look a bit surprising; after all, it can be shown that if you have n items and want to compare each item with all other items, it requires time , which is bigger than .9 Also, if you have n items that you want to sort, you definitely need time to examine all of them. Sorting them requires multiplying that number by a smaller factor than n itself. We’ll see how this can be done, later on in the book.
下一个计算复杂度族是n 的常数次幂,即 ;这被称为多项式复杂度。多项式时间算法非常高效,除非k很大,但这种情况很少发生。当我们尝试解决计算问题时,如果能找到一个多项式时间算法,我们通常会很高兴。
The next computational complexity family is n raised to a constant power, ; this is called polynomial complexity. Polynomial time algorithms are efficient, except if k is big, but this rarely happens. When we try to solve a computational problem, we are usually delighted if we can come up with a polynomial time algorithm.
形式的复杂度称为指数复杂度。注意它与指数为常数的多项式复杂度的区别;这里的指数是变化的。我们看到了指数增长是如何爆炸式增长的。宇宙将无法存在足够长的时间去看到针对非平凡输入的指数算法的答案。从理论的角度来看,这样的算法很有趣,因为它们表明可以找到解决方案。然后,我们可以寻找复杂度更低的更好算法,或者我们也许能够证明找不到更好的算法,在这种情况下,我们可以接受一些不太理想的结果——例如近似解。
A complexity of the form is called exponential complexity. Note the difference with the polynomial complexity where the exponent was constant; here it is the exponent that changes. We saw how exponential growth explodes. The universe will not survive long enough to see the answer of exponential algorithms for nontrivial inputs. Such algorithms are interesting from a theoretical point of view because they show that a solution can be found. We can then search for better algorithms with lower complexity, or we may be able to prove that no better algorithms can be found, in which case we can settle for something less than the ideal—for instance, approximate solutions.
有一种东西的增长速度甚至比幂运算还要快,那就是阶乘。如果你以前没遇到过阶乘,那么自然数n的阶乘——我们写为n !——就是所有直到该数的自然数的乘积:。即使你没遇到过 100!,你很可能在不知情的情况下遇到过 52!。这是一副牌的不同洗牌次数。以阶乘来衡量运行时间的算法具有阶乘复杂度。
There is something that grows even faster than exponentiation, and this is the factorial. If you have not encountered a factorial before, the factorial of a natural number n—which we write as n!—is simply the product of all the natural numbers up to and including that number: . Even if you have not encountered 100! you probably have encountered 52! even without knowing it. That is the number of different shuffles of a deck of cards. Algorithms whose running time is measured in factorials have factorial complexity.
虽然像 100! 这样的数字可能看起来很奇特,但它在许多并不奇特的场景中都会出现,而不仅仅是纸牌游戏。例如,以下问题:“如果我们有一个城市列表,以及每对城市之间的距离,什么是访问每个城市一次并返回原城市的最短可能路线?"这被称为旅行商问题,解决这个问题的明显方法是检查所有城市的每条可能路径。不幸的是,对于n 个城市,这个数字是n!在 20 个城市之后,问题就变得难以处理了。有些算法比 稍好一些,但还不足以实用。对于这样一个简单的问题,这似乎令人惊讶,但我们能在可接受的时间内解决它的唯一方法是找到一个可能不是最优解,但足够接近最优解。许多其他具有重要实际意义的问题都是难以解决的——也就是说,我们不知道一个实用的算法来精确解决它们。即便如此,寻求越来越好的近似算法仍然是计算机科学中一个充满活力的领域。
Although numbers like 100! may seem exotic, they arise in many nonexotic settings and not just card games. Take, for example, the following problem: “If we have a list of cities and the distances between each pair of them, what is the shortest possible route that one should take to visit each city once and return to the origin city?” This is called the traveling salesman problem, and the obvious way to solve it is to examine every possible path taking in all cities. Unfortunately, for n cities this is n! The problem is unmanageable after, say, 20 cities. There are some algorithms that do it a bit better than , but not enough to be practical. Surprising as it may seem for such a straightforward problem, the only way we can solve it in an acceptable time is by finding a solution that may not be the optimal one, but is close enough to it. Many other problems of great practical importance are intractable—that is, we don’t know a practical algorithm to solve them exactly. Even so, the quest for better and better approximation algorithms is a vibrant field in computer science.
在下表中,您可以看到我们列出的复杂度族中,不同n值下各种函数的值。第一行给出了n值,也代表线性复杂度;后续行显示了复杂度递增的族。随着n 的增加,函数值也会增加,但增加的方式有所不同。函数n^ 3 的值可以从一百万增加到一千万亿,但这与n^2或 100 相比根本不算什么!我们在行后留了一块空白
,将实用算法与不实用算法区分开来。两者之间的界限是多项式算法,它正如我们所见,这些算法具有实际用途。复杂度较高的算法通常没有实际用途。
In the table that follows, you can see the value of various functions, falling under the complexity families we presented, for different values of n. The first row gives the n values and also stands in for linear complexity; subsequent rows show families of increasing complexity. As n increases, the function values increase, but the way they increase is different. The function n3 will take us from one million to one quintillion, but that is nothing compared to or 100! We have left a blank like after the row, separating practical from impractical algorithms. The border between the two are the polynomial algorithms, which as we saw are of practical use. Algorithms with higher complexity are usually not of practical use.
十八世纪,柯尼斯堡的善良市民们会在周日下午漫步于城中。柯尼斯堡城建于普雷格尔河畔。这条河在城内形成了两座大岛;两座岛屿之间以及与大陆之间,共有七座桥梁相连。
In the eighteenth century, the good citizens of Königsberg strolled around their city on Sunday afternoons. The city of Königsberg was built on the banks of the river Pregel. The river created two large islands within the city; the islands were connected to the mainland and each other with seven bridges in total.
在欧洲历史的变幻莫测中,柯尼斯堡经历了日耳曼帝国、普鲁士帝国、俄罗斯帝国、魏玛共和国和纳粹德国的统治,二战后成为苏联的一部分,并更名为加里宁格勒,也就是这座城市如今的名字。如今,它属于俄罗斯,但并未与俄罗斯本土相连。加里宁格勒位于波罗的海沿岸的一块俄罗斯飞地内,夹在波兰和立陶宛之间。
Swept by the vagaries of European history, Königsberg passed from the Teutonic nights, to Prussia, Russia, the Weimer Republic, and Nazi Germany, and after the Second World War, it became part of the USSR and was renamed Kaliningrad, which is the name of the city today. It is part of Russia now, although not connected to Russia proper. Kaliningrad is situated in a Russian enclave, on the Baltic Sea, wedged between Poland and Lithuania.
过去,善良的公民们所考虑的问题是,是否有可能让他们的步行时恰好穿过所有七座桥梁一次。该问题以其所在城市命名,称为柯尼斯堡桥梁问题。为了大致了解该问题的性质,这里有一张当时柯尼斯堡的绘图。桥梁用它们周围的椭圆形表示。这座城市有两个岛屿,但你只能看到一个完整的岛屿;另一个岛屿向右延伸到地图边界之外。1
Back in the day, the problem occupying the minds of the good citizens was whether it was possible to make their walks while crossing all seven bridges exactly once. The concern was named after its host city as the Königsberg bridge problem. To get a glimpse of the nature of the issue, here is a drawing of Königsberg at the time. The bridges are indicated by ovals drawn around them. The city had two islands, but you can see only one island in its entirety; the other one extends to the right beyond the boundaries of the map.1
我们不知道具体是怎么回事,但著名的瑞士数学家莱昂哈德·欧拉发现了这个问题;这个问题在 1736 年 3 月 9 日寄出的一封信中有所提及,这封信来自普鲁士城市但泽的市长,该城市位于柯尼斯堡以东 80 英里处(但泽现称为格但斯克属于波兰)。与欧拉的通信似乎是市长为鼓励普鲁士数学发展而做出的努力的一部分。
We don’t know exactly how, but the famous Swiss mathematician Leonhard Euler learned about the problem; the problem is mentioned in a letter sent on March 9, 1736, from the mayor of Danzig, a city in Prussia 80 miles to the east of Königsberg (Danzig is now called Gdansk and belongs to Poland). The correspondence with Euler seems to have been part of an effort by the mayor to encourage the growth of mathematics in Prussia.
当时,欧拉居住在俄罗斯圣彼得堡。他致力于研究这个问题,并于1735年8月26日向圣彼得堡科学院的成员们提交了解决方案。次年,欧拉用拉丁文撰写了一篇论文,阐述了他的解决方案。2解决方案是否定的:不可能只走一遍每座桥就完成一次城市之旅。这将成为数学史上一段引人入胜的篇章,但通过解决这个问题,欧拉开创了一个全新的数学分支:图论。3
Euler was at the time living in Saint Petersburg in Russia. He worked on the problem and presented a solution to the members of the Saint Petersburg Academy of Sciences on August 26, 1735. In the following year, Euler wrote a paper, in Latin, describing his solution.2 The solution was negative: it was not possible to make a tour of the city crossing each bridge only once. That would be an interesting piece of mathematical history, but by solving the problem, Euler created a whole new branch of mathematics: the study of graphs.3
在讨论图表之前,我们先来看看欧拉是如何解决这个问题的。首先,他将问题抽象到最基本的要素。不需要详细的柯尼斯堡地图来表达这个问题。欧拉绘制了下图:4
Before we go into graphs, let’s see how Euler tackled the problem. First of all, he abstracted the problem to its bare essentials. No detailed map of Königsberg is needed to represent the question. Euler drew the following diagram:4
他用字母 A 和 D 代表两个岛屿,用字母 B 和 C 代表大陆上的两岸。下一步是进一步抽象图表,使其脱离物理几何,转而关注桥梁、岛屿和大陆之间的联系,因为这才是真正影响问题的因素:
He used the letters A and D for the two islands, and B and C for the two banks on the mainland. The next step is to abstract the diagram even more, away from the physical geometry, and to the connections between bridges, islands, and mainland, because this is what really matters for the problem:
我们把陆地画成圆圈,把桥梁画成连接圆圈的线。那么问题可以重新表述如下:如果你有一支铅笔,能否从任意一个圆圈开始,放下铅笔,沿着线条画,铅笔不用离开纸,并且能够恰好经过每条线一次?
We have drawn the landmasses as circles, and the bridges as lines connecting the circles. The problem then can be restated as follows: If you have a pencil, is it possible to start from any of the circles, put the pencil down, and follow the lines without lifting the pencil from the paper so that you can pass through every line exactly once?
欧拉的解决方案如下:每当你进入一片陆地,你必须离开它,除非这是你步行的起点或终点。为了做到这一点,除了起点和终点之外,每片陆地都必须有偶数个桥的数量,这样每次进入时,你都可以根据需要从不同的桥离开。现在,请看图,数一数连接每个大陆块的桥的数量。你会发现所有大陆块都由奇数座桥连接:A 有五座桥,B、C 和 D 有三座桥。无论我们选择哪个大陆块作为起点和终点,在旅程中我们都会访问另外两块大陆块,它们分别有奇数座桥。我们不能只通过它们的桥一次就进入和离开它们。
Euler’s solution went as follows. Whenever you enter a landmass, you must leave it, except if this is the start or end of your walk. In order to do that, each landmass, apart from the start and finish, must have an even number of bridges so that each time you enter it, you can leave it from a different bridge, as required. Now go to the figure and count the number of bridges connecting each landmass. You will find out that all landmasses are connected with an odd number of bridges: A has five bridges, and B, C, and D have three bridges. Whichever of the landmasses we choose as starting and ending points, there will be two other landmasses that we will visit in the midst of our tour, and they have an odd number of bridges each. We cannot enter and leave them traversing their bridges only once.
事实上,如果我们在旅途中的某个点到达 B,我们必须过一座桥才能到达那里。我们将过第二座桥离开那里。我们必须在稍后某个时间过第三座桥,因为我们被要求过完所有的桥。但是这样我们就被困在 B 了,因为没有第四座桥,我们不能第二次过我们已经过过的桥。C 和 D 也是如此,它们也有三座桥。同样的道理也适用于作为中间点的 A,因为它有五座桥;在过完 A 的所有五座桥之后,我们将无法从另一座第六座桥离开那里,因为这样的桥并不存在。
Indeed, if we arrive at B at some point on our tour, we must have crossed a bridge to get to it. We will cross a second bridge to leave it. We must cross the third bridge at some later time because we are required to cross all bridges. But then we are stuck at B because there is no fourth bridge and we cannot cross a second time a bridge that we have already crossed. The same goes for C and D, which also have three bridges. Exactly the same argument holds for A as an intermediate point as it has five bridges; after crossing all five bridges of A, we won’t be able to leave it from a different, sixth bridge because such a bridge does not exist.
我们画的图形由圆和连接圆的线组成。用更恰当的术语来说,我们创建了一个由节点或顶点组成的结构,这些节点或顶点之间通过边或链接连接。由节点和边的集合组成的结构就是图;欧拉是首先将图视为一种结构并探索其属性。用今天的话来说,柯尼斯堡桥问题处理的是路径:图中的路径是连接一系列节点的边序列。那么,柯尼斯堡问题就是寻找欧拉路径或欧拉游走的问题:一条穿过图的路径,使得每条边被访问一次。始于和终于同一节点的路径称为巡回或电路。如果我们还添加限制(原始问题中没有),即我们希望欧拉路径始于和终于同一点,那么我们就得到了欧拉巡回或欧拉电路。
The figure we drew consists of circles and lines connecting them. To use the proper terminology, we created a structure that is composed of nodes or vertices connected with edges or links between them. A structure that is composed of sets of nodes and edges is a graph; Euler was the first to recognize graphs as a structure and explore their properties. In today’s language, the Königsberg bridge problem deals with paths: a path in a graph is a sequence of edges that connect a sequence of nodes. Then the Königsberg problem is the problem of finding a Eulerian path or Eulerian walk: a trail through a graph such that each edge is visited exactly once. A path that starts and ends at the same node is called a tour or circuit. If we also add the restriction (not in the original problem) that we want the Eulerian path to start and finish at the same point, then we have a Eulerian tour or Eulerian circuit.
图的应用如此广泛,足以写成几本书。任何可以通过节点连接到其他节点来建模的事物都可以用图来表示。一旦我们做到这一点,就可以提出各种有趣的问题;在这里,我们将有机会一窥究竟。
The applications of graphs are so numerous that they fill entire books. Anything that can be modeled by nodes connected to other nodes can be represented as a graph. Once we do that, we can ask all kinds of interesting questions about it; here we’ll have the opportunity to take just a glance.
不过,在我们开始之前,为了取悦那些最严谨的读者,我们先来提一个小细节。我们提到过,图是由顶点和边的集合组成的结构。在数学中,集合不会包含两次相同的项。然而,在我们对柯尼斯堡的表示中,同一条边出现了不止一次;例如,A 和 B 之间有两条边。边以其起点和终点来区分,因此 A 和 B 之间的两条边实际上是同一条边的两个实例。那么,这些边的集合实际上并不是一个集合;它是多重集——即允许其元素有多个实例的集合。同样,柯尼斯堡图实际上并非图,而是多重图。
Before we do that, though, here is a small detail to please the most rigorous minded of readers. We mentioned that a graph is a structure that comprises sets of vertices and edges. In mathematics, a set does not contain the same item twice. Yet in our representation of Königsberg, we have the same edge appear more than once; there are, for example, two edges between A and B. An edge is distinguished by its starting and ending points, so the two edges between A and B are in fact two instances of the same edge. Then the set of the edges is not really a set; it is a multiset—that is, a set that allows for multiple instances of its elements. In the same way, the Königsberg graph is not really a graph but rather a multigraph.
图的定义很广泛,它可以涵盖所有可以表示为相互连接的事物。图可能与某个地方的拓扑结构有一定关联,但节点和链接可能与空间中的点无关。
The definition of a graph is wide so that it can encompass everything that can be represented as things connected to other things. The graph may have some relevance to the topology of a place, but the nodes and links may have nothing to do with points in space.
社交网络就是这种图谱的一个例子。在社交网络中,节点代表社交参与者(可能是个人或组织),链接代表他们之间的互动。社交参与者可能是现实世界中的演员,链接可能是他们在电影中的合作。社交参与者也可能是我们自己,链接可能是我们在社交网络应用中与其他人之间的联系。我们可以利用社交网络来寻找人们的社群,前提是社群是由相互互动的人们组成的。目前已有算法能够在包含数百万个节点的图中高效地找到社群。
A social network is an example of such a graph. In a social network, nodes are social actors (these may be individuals or organizations), and the links represent interactions between them. The social actors may be real-world actors, and the links may be their collaborations in films. The social actors can be us, and the links may be our connections to other people in a social network application. We can then use social networks to find communities of people, starting from the premise that communities are formed by people who interact with each other. There exist algorithms that are able to find efficiently communities in graphs with millions of nodes.
柯尼斯堡图中的边是无向的,这意味着我们可以双向遍历它们;例如,我们可以从 A 到 B,也可以从 B 到 A。社交网络中,当连接是互惠的时,这种情况就出现了。但这并非总是必要的。根据我们的应用,图中的边可能是有向的。当这种情况发生时,我们会在边的两端画上箭头。有向图简称为有向图。您可以在下面看到一个有向图。请注意,这不是多重图;从 A 到 B 的边与从 B 到 A 的边不同。
The edges in the Königsberg graph are not directed, meaning that we can traverse them both ways; for example, we can go from A to B and B to A. The same goes for social networks, when the connections are reciprocal. That is not always necessary. Depending on our applications, edges in a graph may be directed. When this happens, we draw the edges with arrows at their ends. Directed graphs are called digraphs for short. You can see a digraph below. Note that this is not a multigraph; the edge from A to B is not the same as the edge from B to A.
图的定义很广泛,它可以涵盖所有可以表示为与其他事物相连的事物。
The definition of a graph is wide so that it can encompass everything that can be represented as things connected to other things.
万维网就是一个(巨大的)有向图的例子。我们可以用节点代表网页,用边代表每对网页之间的超链接来表示万维网。这个图是有向图,因为一个页面可能链接到另一个页面,但另一个页面不一定链接回第一个页面。
The World Wide Web is an example of a (huge) directed graph. We can represent the web with nodes standing in for web pages and edges standing in for the hyperlinks between each pair of pages. This graph is a directed graph, because a page may link to another page, but that other page does not necessarily link back to the first page.
如果可以从图中的某个节点出发,遍历边,然后回到起始节点,我们就说该图存在环。并非所有图都有环。柯尼斯堡图就有环——尽管它确实没有欧拉回路。科学史上一个著名的循环图(实际上是多重图)是奥古斯特·凯库勒的苯分子结构模型:5
When it is possible to start from a node in a graph, traverse edges, and come back to the node we started from, we say that the graph has a cycle. Not all graphs have cycles. The Königsberg graph has cycles—although it does not have a Eulerian circuit. A famous cyclic graph (actually a multigraph) in the history of science is August Kekulé’s model of the molecular structure of benzene:5
无环的图称为无环图。有向无环图是一类重要的图,我们通常称之为有向无环图 (DAG)。有向无环图有很多用途;例如,我们用它们来表示任务之间的优先级(任务是节点,优先级是它们之间的链接)、依赖关系、先决条件以及其他类似的安排。现在我们先不讨论无环图,转而关注有环图,它将为我们开启图算法的大门。
A graph without a cycle is called an acyclic graph. Directed acyclic graphs form an important class of graphs. We usually call them dags. Dags have many uses; for example, we use them to represent priorities between tasks (tasks are nodes, and priorities are links between them), dependency relations, prerequisites, and other similar arrangements. We’ll leave aside acyclic graphs now and turn our attention to cyclic graphs, which will provide us with a first window on algorithms on graphs.
过去几十年来最重要的科学发展之一是人类基因组的破译。由于在这一努力中开发的技术,我们现在可以研究遗传疾病、检测突变、研究灭绝物种的基因组以及其他令人着迷的应用。
One of the most important scientific developments of the last decades has been the decoding of the human genome. Thanks to the techniques that were developed in that effort, we can now investigate genetic diseases, detect mutations, and study genomes of extinct species, among other fascinating applications.
基因组编码于DNA中,DNA是一种由双螺旋结构构成的大型有机分子。双螺旋结构由四种碱基组成:胞嘧啶 (C)、鸟嘌呤 (G)、腺嘌呤 (A) 和胸腺嘧啶 (T)。双螺旋结构的每一部分都由一系列碱基构成,例如ACCGTATAG。双螺旋结构的另一部分由碱基构成,这些碱基根据AT和CG规则与第一部分上的相应碱基连接。因此,如果螺旋结构的一部分是ACCGTATAG,那么另一部分将是TGGCATATC。
Genomes are encoded in the DNA, a large organic molecule that is composed of a double helix. The double helix is made up of four bases: cytosine (C), guanine (G), adenine (A), and thymine (T). Each part of the double helix is constructed from a series of bases, like ACCGTATAG. The other part of the double helix is constructed from bases that are connected with their corresponding bases on the first part, according to the rules A-T and C-G. So if one part of the helix is ACCGTATAG, the other part will be TGGCATATC.
为了确定未知DNA片段的组成,我们可以按照以下方法进行。我们复制多条DNA链,并将它们分解成小片段——例如,每个片段包含三个碱基。使用专门的仪器,我们可以轻松识别这些小片段。这样,我们最终会得到一组已知片段。接下来,我们面临的问题是将这些片段组装成一个DNA序列,从而确定其组成。
In order to find the composition of an unknown DNA piece, we can work as follows. We create many copies of the chain and break them up into little fragments—for instance, fragments containing three bases each. Using specialized instruments, we can identify such small fragments easily. In this way we end up with a set of known fragments. We are then left with the problem of assembling the fragments into a DNA sequence, whose composition we will then know.
假设我们有以下片段,或者说是已知的聚合物:GTG、TGG、ATG、GGC、GCG、CGT、GCA、TGC、CAA 和 AAT。每个片段的长度都是 3;为了找到 DNA 序列,将它们分解后,我们创建一个图。在该图中,顶点是由长度为三的聚合物衍生而来的长度为二的聚合物,对于每种长度为三的聚合物,取前两个和后两个聚合物。因此,从 GTG 我们可以得到 GT 和 TG,从 TGG 我们可以得到 TG 和 GG。在该图中,我们为用于衍生这两个顶点的每种初始聚合物(长度为三的聚合物)添加一条边。我们将聚合物的名称赋予该边。从 ATG 我们可以得到顶点 AT 和 TG 以及边 ATG。您可以看到由我们的示例生成的图:
Suppose then that we have the following fragments, or polymers as they are known: GTG, TGG, ATG, GGC, GCG, CGT, GCA, TGC, CAA, and AAT. Each one of them has a length of three; to find the DNA sequence from which they were broken up, we create a graph. In that graph, the vertices are polymers of length two that are derived from the polymers of length three, taking for each polymer of length three the first two and last two polymers. So from GTG we will get GT and TG, and from TGG we will get TG and GG. In the graph, we add one edge for every one of the initial polymers or length three that was used to derive the two vertices. We give the name of the polymer to that edge. From ATG we get vertices AT and TG and the edge ATG. You can see the graph that results from our example:
利用我们创建的图,我们只需要找到一条恰好访问所有边一次的路径——即欧拉回路——就能找到初始的 DNA 序列。用于在图上寻找欧拉回路的 Hierholzer 算法由德国数学家 Carl Hierholzer 于 1873 年发表,其代码如下:6
With the graph we have created, we only need to find a tour in the graph that visits all edges exactly once—that is, an Eulerian circuit—in order to find the initial DNA sequence. The Hierholzer algorithm for finding Eulerian circuits on graphs was published by the German mathematician Carl Hierholzer in 1873 and goes like this:6
如果我们使用示例图中的算法,我们将找到下图中的路径:
If we use the algorithm in our example graph, we will find the path in the following figure:
我们从 AT 出发,完成了 AT TG
GG
GC
CA
AA AT的路径
。我们完成了一次路径,但并未覆盖所有边。我们发现 TG 有一条边 TGC,我们尚未覆盖。因此,我们前往 TG,并沿着 TGC 边开始路径,得到 TG
GC
CG
GT
TG。我们将第二条路径与第一条路径拼接,得到图中所示的路径:AT
TG (
GC
CG
GT
TG)
GG
GC
CA
AA
AT。如果我们沿着生成的路径从第一个节点走到最后一个节点,无需踩到最后一个顶点,并连接所有顶点,保留它们仅有一次共同碱基,我们得到了DNA序列ATGCGTGGCA。你可以验证这个序列是否包含了我们一开始的所有聚合物;如果在到达序列末尾后绕回并回到开头,就会找到CAA和CAT。
We started from AT and made the tour AT TG GG GC CA AA AT. We made a tour, but we did not cover all the edges. We see that TG has an edge, TGC, that we have not covered yet. So we go to TG and do a tour starting along the TGC edge, getting TG GC CG GT TG. We splice the second path into the first, getting the one in the figure, AT TG ( GC CG GT TG) GG GC CA AA AT. If we walk the resulting path from the first node to the last, without stepping on the last, and concatenate the vertices keeping their common base only once, we get the DNA sequence ATGCGTGGCA. You can verify that this sequence contains all the polymers with which we started; CAA and CAT are found if you wrap around when you reach the end of the sequence and go to the beginning.
在这个特定的例子中,我们只找到了一条额外的路径,并将其拼接到原始路径中。通常情况下,可能会有更多路径;只要还有我们尚未覆盖的顶点及其边,算法就会重复第 3 步。Hierholzer 算法速度很快:如果实现得当,它可以在线性时间内运行,其中n是图中的边数。7
In this particular illustration, we only found one additional tour that we spliced into the original one. In general, there may be more; step 3 of the algorithm is repeated as long as there are vertices with edges that we have not covered yet. Hierholzer’s algorithm is fast: if implemented properly, it runs in linear time, , where n is the number of edges in the graph.7
假设你正在组织一场锦标赛,参赛者将以双人赛的形式进行比赛,因此我们将进行一系列比赛。我们有八名参赛者,每位参赛者将参加四场比赛。我们的问题是如何安排锦标赛。我们希望安排比赛,使每位参赛者每天只参加一场比赛。
Suppose you are organizing a tournament in which the contestants will compete in pairs, so we’ll have a series of matches. We have eight contestants, and each contestant will play four matches. Our problem is how to schedule the tournament. We want to schedule the matches so that each contestant plays only one match per day.
一个显而易见的解决方案是每天只进行一场比赛,并允许锦标赛持续到需要的时间。由于我们有八名选手,每位选手参加四场比赛,因此锦标赛将持续16天。(;我们除以二,以免每场比赛重复计算)。我们将八位参赛者命名为 Alice、Bob、Carol、Dave、Eve、Frank、Grace 和 Heidi。这样我们只需使用他们名字的首字母即可识别他们。
An obvious solution is to have just one match per day and allow the tournament to last as long as needed. As we have eight contestants and each contestant plays four matches, the tournament would roll out over 16 days (; we divide by two so as not to count each match twice). We’ll name the eight contestants Alice, Bob, Carol, Dave, Eve, Frank, Grace, and Heidi. This allows us to use only the initial letter of their names to identify them.
如果我们将这个问题建模成图,就能找到更好的解决方案。我们为每位球员设置一个顶点,为每场比赛设置一条边。这样,图看起来就会像下面左侧那样。在右侧,我们用相应比赛的日期标记了边。我们是如何找到这个解决方案的?
We can find a better solution if we model the problem as a graph. We’ll have a vertex for each player and an edge for each match. Then the graph will look like the one on the left below. On the right, we have labeled the edges with the day on which the corresponding match will take place. How did we find this solution?
我们同意将比赛日期按顺序编号。比赛从第零天开始。我们将逐一安排所有比赛。
We agree to number the tournament days consecutively. Let the tournament start on day zero. We’ll schedule all matches, one by one.
这个算法看似简单,你可能会怀疑它是否真的能解决我们的问题。那么,让我们逐步讲解一下,看看会发生什么。在下表中,我们可以逐一查看比赛情况,以及每场比赛的安排日期,并将算法应用到图表上。你应该先阅读表格的前两列,然后再阅读后两列:
This algorithm looks deceptively simple, and you may doubt that it really solves our problem. So let’s walk through it and see what happens. In the following table we can see the matches, one by one, and the day on which we schedule each match, as we apply the algorithm on the graph. You should read the first two columns of the table and then the next two:
| 匹配 | 天 | 匹配 | 天 |
|---|---|---|---|
|
A、B A, B |
0 0 |
C、F C, F |
3 3 |
|
A、D A, D |
1 1 |
C、G C, G |
2 2 |
|
A、E A, E |
2 2 |
D、G D, G |
3 3 |
|
A、H A, H |
3 3 |
D、H D, H |
2 2 |
|
B、C B, C |
1 1 |
英、法 E, F |
0 0 |
|
是 B, E |
3 3 |
E、H E, H |
1 1 |
|
B、F B, F |
2 2 |
F、G F, G |
1 1 |
|
光盘 C, D |
0 0 |
G、H G, H |
0 0 |
我们首先进行 Alice 对阵 Bob 的比赛。Alice 和 Bob 在第零天(也就是我们分配比赛的那一天)都没有参加任何其他比赛。
We start by taking the match Alice versus Bob. Neither Alice nor Bob play any other match on day zero—that is, the day on which we’ll assign the match.
然后,我们选取另一场尚未安排的比赛——比如 Alice 对阵 Dave。虽然没有强制要求,但我们会按照字典顺序选取比赛选手。不过,我们也可以用其他任何方式,甚至是随机的,只要每场比赛只处理一次即可。Alice 已经安排了第 0 天的比赛,所以最早可以参加比赛的日期是第一天。
We then take another match we have not scheduled yet—say, Alice versus Dave. Although there is no requirement to do so, we’ll take the match players in lexicographical order as we continue, but bear in mind that we could take them in any other way, even randomly, as long as we treat each match only once. Alice already has a match scheduled on day zero, so the earliest available day for the match is day one.
接下来是爱丽丝和伊芙的比赛。爱丽丝在第0天和第一天都订了名,所以我们会安排在第二天。爱丽丝的最后一场比赛是和海蒂的;爱丽丝在第0天、第一天和第二天都订婚了,所以这场比赛只能安排在第三天了。
Next comes the match between Alice and Eve. Alice is booked on day zero and day one, so we’ll schedule it on day two. Alice’s final match will be with Heidi; Alice is engaged on days zero, one, and two, so this will have to go on day three.
我们已经安排好了 Alice 的比赛。接下来是 Bob 的比赛,除了我们已经安排好的与 Alice 的比赛之外,我们需要安排 Bob 对阵 Carol。Bob 已经安排在第零天(与 Alice 一起),所以这场比赛必须在第一天进行。安排 Bob 对阵 Eve 时,我们注意到 Bob 已经在第零天和第一天有事(我们刚刚安排好),而 Eve 计划在第二天与 Alice 比赛;因此,我们安排 Bob 对阵 Eve 的比赛在第三天。说到 Bob 对阵 Frank,Bob 在第零天和第一天都有比赛,但第二天有空,而 Frank 到目前为止还没有比赛。所以 Bob 对阵 Frank 的比赛在第二天进行,比 Bob 对阵 Eve 的比赛早。
We are done with Alice. Moving on to Bob’s matches, except for the one with Alice, which we have already scheduled, we need to plan Bob versus Carol. Bob is already scheduled on day zero (with Alice), so this match will have to go on day one. Scheduling Bob versus Eve, we notice that Bob is already engaged on day zero and day one (we just scheduled that), while Eve is scheduled to play on day two with Alice; we therefore schedule Bob versus Eve on day three. Going to Bob versus Frank, Bob has matches on days zero and one, but is free on day two, while Frank has no matches at all as of yet. So Bob versus Frank goes on day two, earlier than Bob versus Eve.
Bob 结束后,我们将安排 Carol 的比赛。Carol 和 Dave 在第 0 天都没有安排比赛,所以 Carol 对阵 Dave 的比赛将在锦标赛第一天进行。之后,Carol 对阵 Frank 的比赛可以安排在第三天进行,因为 Carol 会在第 0 天(我们刚刚安排好)和第一天(与 Bob 比赛,之前已经安排好)进行比赛,而 Frank 会在第二天与 Bob 比赛(之前也安排好)。Carol 对阵 Grace 的比赛将在第二天早些时候进行,因为 Grace 目前没有其他比赛安排,而 Carol 在第二天仍然有空。
After Bob, we’ll deal with Carol’s matches. Neither Carol nor Dave have a match scheduled on day zero, so Carol versus Dave will go on the first day of the tournament. After this, the Carol versus Frank match can take place on day three, because Carol plays matches on day zero (we just arranged that) and day one (with Bob, arranged previously), while Frank plays with Bob on day two (also arranged previously). Carol versus Grace will take place earlier, on day two, as Grace has no other matches planned as of yet and Carol is still free on day two.
我们对其余比赛也采取了类似的方法;有趣的是,图表内外方格内的比赛最早会在前两天进行。这是两个不同的小组在比赛开始之前同时进行的。最终,我们找到的解决方案比需要16天的简单解决方案有了显著的改进;我们只需要4天!
We proceed similarly with the rest of the matches; it is interesting that the matches in the inner and outer squares of the graph will happen as early as the first two days. These are two different groups playing in parallel before they start playing between them. At the end, the solution we find is a significant improvement over the naive solution requiring 16 days; we only need four!
这个锦标赛赛程安排问题实际上是一个更普遍问题的实例:边着色问题。图的边着色是指为边分配颜色,使得任何两条相邻边的颜色都不相同。这里应该用象征性的方式理解颜色。在我们的例子中,颜色是天数;通常,它们可以是任何其他不同的值集合。如果我们想要着色的不是边,而是图的顶点,使得由一条边连接的任何两个顶点的颜色都不相同,那么我们就有了顶点着色问题。边和顶点着色属于更广泛的图着色问题,这并不奇怪。
This tournament scheduling problem is in fact an instance of a more general problem: the edge coloring problem. An edge coloring of the graph is an assignment of colors to edges so that no two adjacent edges have the same color. Now color should be taken figuratively here. In our example, the colors are the days; in general, they can be any other set of distinct values. If instead of the edges, we want to color the vertices of the graph so that no two vertices that are linked by an edge share the same color, then we have the vertex coloring problem. Both edge and vertex coloring belong to the wider class of, no surprise, graph coloring problems.
我们描述的边着色算法简单高效(它逐一处理每条边,并且只处理一次)。这就是所谓的贪婪算法。贪婪算法试图通过在每个阶段寻找最佳解(而不是总体最优解)来解决问题。贪婪算法在很多问题中都很有用,因为在解决方案的每个阶段我们都需要做出选择,而我们的规则是“现在看起来最好的”。这种在算法演进过程中指导我们选择的策略称为启发式算法,源自希腊语heuriskein,意思是“寻找”(也就是找到一个解)。
The algorithm we described for edge coloring is simple and efficient (it takes each edge one by one, and only once). It is a so-called greedy algorithm. Greedy algorithms are algorithms that try to solve a problem by finding the best solution at each stage, not the optimal solution in general. Greedy algorithms are useful in many problems when at each stage of the solution we have a choice to make and our rule is “what looks best now.” Such strategies that guide our choices in the evolution of an algorithm are called heuristics, from the Greek heuriskein, which means “to find” (a solution, that is).
稍加思考,我们就能意识到,在算法中,就像在现实生活中一样,当下看起来最好的策略未必就是最好的。延迟满足或许会有回报;当下最好的选择,或许会让我们陷入日后后悔的陷阱。想象一下你正在爬山。贪婪的启发式算法会在每个点选择最陡峭的路径(我们假设你的攀爬能力无与伦比)。这并不一定会带你到达顶峰:它很可能会把你带到一个高原,从那里出发,唯一的出路就是返回。真正的登顶之路,或许是穿过一些较为平缓的山坡。
With some thought we can realize that in algorithms, as in real life, what looks best right now may not really be the best strategy. It may pay off to delay gratification; the best choice right now may lead us to a trap that we’ll regret later on. Imagine you are climbing a mountain. The greedy heuristic would be to select the steepest path at each point (we assume that your climbing prowess is unparalleled). This will not necessary lead you to the top: it may well lead you to a plateau, from which the only way is back. The real way to the top may lie through gentler slopes.
攀爬的比喻在计算机科学中经常用于解决问题。我们将问题建模,使得解决方案位于所有可能步骤的“顶端”。能够做出并尝试找到正确的动作;这被称为爬山法。当我们到达某个类似高原的地方时,我们说我们达到了局部最优,而不是全局最优,也就是我们追求的最高峰。
The climbing metaphor is frequently used in problem solving in computer science. We model our problem so that the solution lies at “the top” of the possible moves we can make and try to find the correct moves; this is called a hill climbing approach. When we arrive at something like a plateau, we say we arrived at a local optimum, but not the global optimum, the highest peak that we are after.
从爬山法回到锦标赛安排,我们为每场比赛选择了第一个可用的日期。不幸的是,这可能不是安排所有比赛的最佳方式。事实上,事实证明,图着色是一个难题。我们给出的算法不能保证给出最优解,即需要最少天数(或一般来说,颜色)的解决方案。与节点相邻的边数称为其度。可以证明,如果图中任何节点的最大度为d,则最多可以用d或种颜色为边着色;图的边所需的颜色数称为其色度。在我们的特定示例中,解决方案是最优的,
并且我们使用了四天。然而,我们的算法可能无法在其他图中找到最优解。它可能会给我们一个比这更糟糕的解决方案。贪婪图着色的好处是我们知道该解决方案可能有多远:它给出的解决方案可能需要最多
颜色,而不是d,但不会比这更糟糕。
From hill climbing back to tournament scheduling, we selected the first available day for each match. Unfortunately, this might not be the best way to schedule all matches. Indeed, it turns out that graph coloring is a difficult problem. The algorithm that we gave is not guaranteed to give the optimal solution—that is, the solution requiring the smallest number of days (or colors, in general). The number of edges adjacent to a node is called its degree. It can be proven that if the largest degree of any node in the graph is d, the edges can be colored with at most d or colors; the required number of colors for the edges of a graph is called its chromatic index. In our particular example, the solution is optimal, , and we used four days. Our algorithm, however, may not be able to find the optimal solution in some other graph. It may give us a solution worse than that. The good thing about greedy graph coloring is that we know how far off that solution might be: the solution it will give may need up to colors, instead of d, but no worse than that.
如果您想了解这是如何发生的,请考虑一个由连接到中心节点的“星星”组成的图表,如下页所示:
If you want to see how this may happen, consider a graph that consists of “stars” connected to a central node, like the one on the next page:
如果我们有k 个星星,每个星星有k条边加上一条到中心节点的边,我们首先给星星上色,我们会用k种颜色给星星的边上色。然后我们还需要k 种额外的颜色将星星连接到中心节点。总共需要 2k种颜色。这就是我们在左边所做的。但这不是最优解。如果我们首先给连接星星和中心节点的边上色,那么我们需要k 种颜色。然后我们可以只使用一种额外的颜色来给星星本身上色,总共需要 种颜色。你可以在右边看到我们如何做到这一点。所有这些都符合理论,因为每个星星的度数为
。
If we have k stars, where each star has k edges plus an edge to the central node, and we start by coloring the stars, we’ll use k colors to color the edges of the stars. Then we’ll need k additional colors to connect the stars to the central node. The total is 2k colors. This is what we did on the left. But this is not the optimal solution. If we start by coloring the edges connecting the stars to the central node, we’ll need k colors for that. Then we can color the stars themselves using only one additional color, for a total of colors. You can see how we can do that on the right. All this is in accordance with theory, as each star has degree .
问题在于,贪婪算法决定以一种最终并非最优的方式(或者用正确的术语来说,以一种并非全局最优的方式)对边进行着色。它可能会找到最佳解决方案,也可能不会。不过,与最优解的差距也不是很大。这让人松了一口气,因为图的着色非常困难,如果我们想要一个能够为每个图找到最佳解的精确算法,该算法的复杂度将呈指数级增长,约为,其中n是图中的边数。因此,精确的边着色算法是无用的,除非图非常小。
The problem is that the greedy algorithm decides to order the edges to color in a way that is not optimal at the end, or to use the proper terminology, in a way that is not globally optimal. It might hit on the best solution, but it might not. Then again, the difference from the optimum solution is not that great. That is a relief because graph coloring is so difficult that if we want an exact algorithm that can find the best solution for every graph, the algorithm will have exponential complexity, about , where n is the number of edges in the graph. Exact edge coloring algorithms are therefore useless, except for tiny graphs.
我们提出的贪婪算法除了实用之外,还有一个额外的优点。它是一种在线算法:即使输入在开始时未知,而是随着算法的进展而出现,该算法也能正常工作。我们无需知道所有边即可开始运行该算法。即使在运行算法时,图是以逐个边的方式构建的,算法也能正常工作。如果在我们开始安排比赛之后,仍有球员报名参加比赛,就会出现这种情况。我们可以随时为每条边(比赛)着色,并且无论图何时完成,我们都会准备好边着色。此外,如果图以这种方式逐步创建,则此贪婪算法是最优算法;如果图是在解决问题的同时构建的,则根本不存在精确的算法,无论多么低效。8
The greedy algorithm we have presented has one additional nice property (apart from being practical). It is an online algorithm: an algorithm that works even if the inputs are not known when we start but instead arrive on the scene as we go. We don’t need to know all the edges to start running the algorithm. The algorithm will work correctly, even if the graph is constructed in a piecemeal fashion, one edge at a time, while we are running the algorithm. This would happen if players are signing up for the tournament even after we have started scheduling the matches. We will be able to color each edge (match) as it comes, and whenever the graph is finished, we’ll have an edge coloring ready. Moreover, this greedy algorithm is the optimum algorithm if the graph is created incrementally in this way; no exact algorithm, no matter how inefficient, exists at all when the graph is constructed while we are solving the problem.8
正如我们所见,贪婪算法的工作原理是每一步都做出最佳决策——而这决策可能并非整体最佳。它带有某种机会主义的性质,或者说是“及时行乐”的感觉。不幸的是,正如伊索寓言所言,一只只活在当下的蚱蜢,或许终有一天会后悔冬天,因为为未来做准备的蚂蚁最终……舒适温暖。9在比赛的策划中,我们发现蚱蜢的下场可能不会那么糟糕。现在是蚂蚁复仇的时候了。
As we saw, a greedy algorithm works by taking the best decision at each step—which may not be the best decision overall. It has a somehow opportunistic nature or carpe diem feeling to it. Unfortunately, as Aesop’s fable tells us, a grasshopper living for the day may yet live to regret the winter, when the ant, who is preparing for the future, ends up cozy and warm.9 In the planning of tournaments, we found that the grasshopper may not end up so badly. Now it is time for the ant’s revenge.
在第一章中,我们讨论了通过枚举所有可能的路径来寻找网格上两点之间的最短路径的不可行性。我们发现,由于路径数量急剧增加,这在实践中根本无法实现。现在,凭借我们对图的了解,我们将发现一种方法。事实上,我们将把这个问题提升到一个更高的层次。我们不再寻找网格上的最短路径(网格具有某种良好的几何形状,并且所有点之间的距离都相等),而是允许任何几何形状,甚至可以添加不同的点间距离。
In chapter 1, we discussed the infeasibility of trying to find the shortest path between two points on a grid by enumerating all the possible paths. We saw that this is impossible to do in practice because the number of paths increases tremendously. Now with our knowledge of graphs, we will see that there is a way. In fact, we’ll take the problem up a notch. Instead of looking for the shortest path on a grid, which has a kind of nice geometry and on which all distances between points are equal, we will allow any geometric shape and even add different distances between points.
为此,我们将创建一个图,其中的节点和边代表一幅地图,并希望找到地图上两个节点之间的最短路径。此外,我们将为每条边附加一个权重。权重可以是正数或零,对应于两个连接节点之间距离的度量。它可以是英里数的距离或小时数的行程时间;任何其他非负指标都可以。路径长度是路径上所有权重的总和;两个节点之间的最短路径是长度最短的路径。如果所有权重都等于 1,则路径长度等于路径上的边数。一旦我们允许权重具有其他值,情况就不再如此。
To do that, we’ll create a graph where we have nodes and edges representing a map, and want to find the shortest way between two nodes on the map. Moreover, we’ll attach a weight to each edge. The weight may be positive or zero, and will correspond to a measure of the distance between the two connected nodes. It may be distance in miles or travel time in hours; any other nonnegative metric will do. Then the path length is the sum of the weights along the path; the shortest path between two nodes is the path with the smallest length. If all weights are equal to one, then the path length is equal to the number of edges on the path. Once we allow weights to have other values, this is no longer true.
在下图中,我们有六个节点,由九条具有不同权重的边连接,并希望找到从节点 A 到 F 的最短路径。
In the following graph, we have six nodes connected by nine edges with varying weights, and want to find the shortest path to travel from nodes A to F.
如果我们采用贪婪启发式算法,我们会从节点 A 出发到 C,那么最佳选择是前往节点 E,然后从那里前往节点 F。路径 A、C、E 和 F 的总长度为 8,但这并非最佳路径。最佳路径是从 A 到 C 再到 D,然后到 F,总长度为 6。因此,贪婪启发式算法不起作用。与锦标赛规划算法不同,它无法保证其相对于实际最短路径的最差性能。然而,与锦标赛规划算法不同,存在有效的算法来寻找最短路径,因此实际上根本没有理由使用贪婪启发式算法。
If we adopt a greedy heuristic, we’ll start by going from node A to C, then the best choice is to go to node E, and from there we make our way to node F. The total length of the path A, C, E, and F is eight, which is not, however, the best path. The best path is to go from A to C to D, and then to F, for a total length of six. So the greedy heuristic does not work, and in contrast to tournament planning, there are no guarantees as to its worst performance in relation to the actual shortest path. Nevertheless, and again in contrast to tournament planning, there exist efficient algorithms for finding the shortest paths so in fact there is no reason to use the greedy heuristic at all.
1956年,年轻的荷兰计算机科学家埃兹格·迪杰斯特拉(Edsger Dijkstra)和未婚妻在阿姆斯特丹购物。他们逛累了,便在一家咖啡馆的露台上坐下来喝咖啡。迪杰斯特拉一边喝咖啡,一边思考如何找到从一个城市到另一个城市的最佳路线。他只用了20分钟就设计出了解决方案,尽管该算法花了三年时间才得以发表。迪杰斯特拉的职业生涯辉煌,但令他惊讶的是,这20分钟的发明却成为了他成名的基石。10
In 1956, a young Dutch computer scientist, Edsger Dijkstra, was shopping in Amsterdam with his fiancée. Having got tired, they sat down at a café terrace to drink a cup of coffee, where Dijkstra thought about the problem of finding the best way to go from one city to another. He designed the solution in 20 minutes, although the algorithm took some time, three years, to get published. Dijkstra led an illustrious career, yet this 20-minute invention remained, to his amazement, a cornerstone of his fame.10
那么这个算法是如何进行的呢?我们想要找到图中从一个节点到所有其他节点的最短路径。该算法使用了一种叫做松弛(relaxation )的思想:我们为想要找到的值(这里指的是距离)赋值。一开始,我们的估计值是最糟糕的。然后,随着算法的进展,我们可以将这些估计值从一开始的极差逐渐松弛到越来越好,直到得到正确的值。
So how does the algorithm go? We want to find the shortest paths from one node to all other nodes in a graph. The algorithm uses an idea called relaxation: we assign estimates for the values we want to find (here, distances). In the beginning, our estimates are the worst possible. Then as the algorithm progresses, we are able to relax these estimates from the extremely bad ones we started with to progressively better and better ones, until we arrive at the correct values.
在 Dijkstra 算法中,松弛过程如下。我们首先为所有节点到起始节点的距离分配一个最差的可能值:我们将距离设为无穷大;显然不可能有比这更差的了!在下图中,我们将最短路径的初始估计值及其路径上前一个节点分别放在每个节点的上方或下方。对于 A 节点,我们有,因为从 A 到 A 的距离为零,并且没有到 A 的前一个节点。对于所有其他节点,我们有,
因为距离为无穷大,并且我们不知道到它们的最短路径。
In Dijkstra’s algorithm, relaxation proceeds as follows. We begin by assigning the worst possible value for the distances of all nodes from our starting node: we set the distance to infinity; clearly there cannot be anything worse than that! In the following figure, we have placed the initial estimate for the shortest path and previous node in that path above or below each node. For the A node, we have because the distance from A to A is zero and there is no previous node to A. For all other nodes, we have because the distance is infinity and we have no idea about the shortest path to them.
我们选取目前距离 A 最短的节点。这个节点就是 A 本身。它是我们的当前节点,因此我们将其标记为灰色。
We take the node with the shortest distance from A thus far. This is A itself. That is our current node, so we mark it gray.
从 A 出发,我们可以检查到其邻居 B 和 C 的最短路径的估算值。最初我们将它们设置为无穷大,但实际上现在我们发现,从 A 到 B 的代价是 3,而从 A 到 C 的代价是 1。我们更新这些估算值,并指出这些估算值是通过 A 实现的;我们在 B 上方写 3/A,在 C 下方写 1/A。在算法的剩余部分,我们完成了对节点 A 的处理。我们相应地更新图形,将 A 标记为黑色。我们移至当前估算值最优的未访问节点。即节点 C。
From A we can check the estimates for the shortest paths to its neighbors, B and C. Initially we had set them at infinity, but in fact now we find out that we can get to B from A at a cost of 3 and we can get to C from A at a cost of 1. We update these estimates and also indicate that the estimates are through A; we write 3/A above B and 1/A below C. We are done with node A for the rest of the algorithm. We update the figure accordingly, marking A black. We move to the unvisited node with the best current estimate. That is node C.
从节点 C 开始,我们检查到其邻居节点 D 和 E 的最短路径估计值。它们位于无穷远处,但是现在我们看到我们可以通过 C 到达每个节点。从 A 到 D 经过 C 的路径总长度为 5,因此我们在 D 上方写上 5/C。从 A 到 E 经过 C 的路径总长度为 3,因此我们在 E 下方写上 3/C。我们已经完成了节点 C 的访问,因此将其标记为黑色,并移至当前估值最佳的未访问节点。节点 B 和 E 的估值均为 3,两者皆为最佳。我们可以任选其一。我们选择 B。
From node C, we check the estimates of the shortest paths to its neighbors, D and E. They were at infinity, but now we see that we can get to each one of them through C. The path from A to D through C has a total length of 5, so we write 5/C above D. The path from A to E through C has a total length of 3, so we write 3/C below E. We are done with node C so we mark it black and move to the unvisited node with the best current estimate. Both nodes B and E have an equally good estimate of 3. We can pick either. Let us pick B.
我们以相同的方式工作。从节点 B 开始,我们检查到其邻居节点 D 和 F 的最短路径的估计值。我们已经从 C 得到了 D 的长度估计值 5;这比我们得到的长度 6 要好。来自 B。因此,我们让对 D 的估计保持不变。对 F 的当前估计是无穷大,因此我们将其更新为 9,来自 B。我们将 B 标记为已访问,并移至具有最佳当前估计的未访问节点。即节点 E。
We work in the same way. From node B, we check the estimates of the shortest paths to its neighbors, D and F. We already have an estimate of length 5 for D, coming from C; that is better than the length 6 that we would get coming from B. So we let the estimate to D remain unchanged. The current estimate to F is infinite so we update it to 9, coming from B. We mark B as visited and move to the unvisited node with the best current estimate. That is node E.
E 的邻居是 F。从 E 到 F 的路径长度为 8,这比我们之前找到的通过 B 的路径要好。我们更新该路径,将 E 标记为已访问,并移至当前估计值最优的未访问节点,即节点 D。
E has F as a neighbor. The path to F from E has length of 8, which is better than the path we had found through B. We update the path, mark E as visited, and move to the unvisited node with the best current estimate, node D.
D 的邻居是 F,我们找到了一条从 E 到 F 的路径,长度为 8。由于我们可以通过 D 到达 F,总长度为 6,因此我们更新该路径。和之前一样,我们移动到当前估计值最优的未访问节点——实际上也是我们唯一未访问的节点 F。
D has F as a neighbor, to which we have found a path coming from E with length 8. As we can get to F through D with a total length of 6, we update that path. As before, we move to the unvisited node with the best current estimate—actually our only unvisited node, F.
从节点 F 开始,我们检查是否应该更新对其邻居节点 E 的估计。当前到 E 的路径长度为 3,而经过 F 的路径成本为 10。我们让 E 保持不变。访问 F 不会带来任何影响,但我们事先不可能知道这一点。由于我们已经访问了所有节点,算法结束。
From node F we check whether we should update our estimate for its neighbor, node E. The current path to E has a length of 3, while the path through F would have a cost of 10. We let E remain unchanged. Visiting F did not make any difference, but we could not have known that beforehand. As we have visited all nodes, the algorithm finishes.
在算法过程中,我们记录了最短路径上每个节点的路径长度和前一个节点。这样做是为了在算法完成后,如果想要找到从 A 到图中任何其他节点(例如 F)的最短路径,我们就从末端开始,一直走到起点。我们读到它的前身:D。我们得到 D 的前身是 C,然后 C 的前身是 A。从 A 到 F 的最短路径是 A、C、D 和 F,总长度为 6,正如我们在讨论开始时提到的那样。
When we were going through the algorithm, we were recording path lengths and the predecessor of each node along the shortest path. We did that so that if after finishing the algorithm we want to find the shortest path from A to any other node in the graph—for example, F—we start from the end and make our way to the start. We read its predecessor: D. We get the predecessor of D, which is C, and then the predecessor of C, which is A. The shortest path from A to F is A, C, D, and F with a total length of six, as we had mentioned way back at the start of our discussion.
最后,Dijkstra 算法找到了图中从起始节点到所有其他节点的所有最短路径。该算法非常高效,因为它的复杂度为,其中m是图中的边数,n是节点数。该算法分为以下几个步骤:
At the end, Dijkstra’s algorithm found all the shortest paths from the starting node to all other nodes in the graph. The algorithm is efficient, as its complexity is , where m is the number of edges in the graph and n is the number of nodes. Here is the algorithm as a set of steps:
如果我们只对到特定节点的最短路径感兴趣,我们可以在步骤 2 中选择该节点进行访问时停止。一旦我们这样做,我们就已经找到了到达它的最短路径,并且它在算法的其余执行过程中不会改变。
If we are only interested in the shortest path to a particular node, we can stop when we pick it to visit in step 2. Once we do that, we have already found the shortest path to it, and it will not change in the rest of the algorithm’s execution.
只要图没有负权重,我们就可以在任何图中使用 Dijkstra 算法,无论图是否有向,即使它包含循环。如果边表示节点之间的某种奖励和惩罚,则可能会发生这种情况。好消息是,在存在负权重的情况下,我们可以使用其他有效的算法,但这突显出算法在适用性方面可能有特殊要求。当我们尝试寻找一种算法来解决我们的问题时,我们应该检查我们的问题是否满足算法的要求。否则算法将不起作用;但请注意,算法不能告诉我们它不起作用。如果我们在计算机上实现该算法,它仍然会执行其步骤,即使这样做没有意义。它会给出一个无意义的答案。我们必须确保我们使用正确的工具来完成正确的工作。
We can use Dijkstra’s algorithm in any graph, directed or not, even if it contains cycles, provided that it does not have negative weights. This might happen if the edges represent some kind of rewards and penalties between nodes. The good news is that there are other efficient algorithms that we can use in the presence of negative weights, but this highlights that algorithms may have particular requirements in their applicability. When we try to find an algorithm to solve our problem, we should check that our problem meets the requirements of the algorithm. Otherwise the algorithm will not work; but note that an algorithm cannot tell us that it does not work. If we implement the algorithm on a computer, it will still execute its steps even if it does not make sense to do so. It will produce an answer that will be nonsense. It is up to us to make sure that we are using the right tool for the right job.
举一个极端的例子,想象一下,如果一个图不仅权重为负,而且还包含一个边数和为负的环,会发生什么:一个负环。那么,任何算法都无法找到图中的最短路径,因为它们根本不存在。如果我们有一个负环,我们可以绕着它的边一圈一圈地走,每次路径的长度都会减少。我们可以一直这样下去,沿着这个环的路径最终会变成负数。无穷大。计算机科学家和程序员对在程序中放入一些对程序来说毫无意义的东西有一个说法:垃圾进,垃圾出。人类有责任找出垃圾,并知道何时使用什么。大学算法课程的一个重要部分就是教会崭露头角的计算机科学家何时使用什么。
For an extreme example, think of what would happen with a graph that not only has negative weights but also a cycle where the sum of the edges is negative: a negative cycle. Then no algorithm would find the shortest paths in the graph because they do not exist. If we have a negative cycle, we can go round and round its edges, and every time the length of the path will be reduced. We can continue forever, and the path along the cycle will get to negative infinity. Computer scientists and programmers have a name for when we put something in a program that does not make sense for it: garbage in, garbage out. It is up to humans to ferret out the garbage and know what to use when. An important part of algorithm courses in universities is exactly to teach budding computer scientists what to use when.
当我们尝试寻找一种算法来解决问题时,我们应该检查问题是否符合该算法的要求。否则,算法将无法工作;但算法不能告诉我们它不起作用。
When we try to find an algorithm to solve our problem, we should check that our problem meets the requirements of the algorithm. Otherwise it will not work; but an algorithm cannot tell us that it does not work.
算法可以做各种各样的事情,从翻译文本到驾驶汽车,这一事实可能会让我们对算法的主要用途产生误解。答案可能看起来很平常。如果不使用算法进行数据搜索,你几乎不可能找到任何有用的计算机程序。
The fact that algorithms can do all sorts of stuff, from translating text to driving cars, can give us a misleading picture of what algorithms are mostly used for. The answer may seem mundane. It is unlikely that you will be able to find any computer program doing anything at all useful without employing algorithms for searching in data.
这是因为搜索几乎在所有情况下都会以某种形式出现。程序接收数据;它们经常需要在数据中搜索某些内容,因此几乎肯定会使用搜索算法。搜索不仅是程序中的一项常见操作,而且由于它频繁发生,搜索可能是应用程序中最耗时的操作。一个好的搜索算法可以显著提高速度。
That is because searching in one form or another appears in almost every context. Programs take in data; often they will need to search for something in them and so a searching algorithm will almost certainly be used. Not only is searching a frequent operation in programs but, because it happens frequently, searching can be the most time-consuming operation in an application. A good search algorithm can result in dramatic improvements in speed.
搜索是指在一组物品中寻找特定物品。这个问题的一般描述包含多种变体。项目是以与我们的搜索相关的某种方式排列的,还是随机排列的,这有很大的不同。另一种情况是,项目被逐一提供给我们,我们必须在面对它时就决定我们是否找到了正确的项目,而没有能力重新思考我们的决定。如果我们在一组项目中反复搜索,那么知道某些项目是否比其他项目更受欢迎就很重要,这样我们最终会更频繁地搜索它们。我们将在本章中研究所有这些变体,但请记住,还有更多。例如,我们将只讨论精确搜索问题,但在许多应用中我们需要近似搜索。想想拼写检查:当你输入错误时,拼写检查器必须搜索与它无法识别的单词相似的单词。
A search involves looking for a particular item among a group of items. This general problem description encompasses several variations. It makes a big difference whether the items are ordered in some way that is related to our search or come in random order. A different scenario occurs when the items are given to us one by one and we have to decide if we have found the correct one right when we confront it, without the ability to rethink our decision. If we search repeatedly in a set of items, it is important to know if some items are more popular than others so that we end up searching for them more often. We will examine all these variations in this chapter, but keep in mind that there are more. For example, we will only present exact search problems, but there are many applications in which we need an approximate search. Think of spellchecking: when you mistype something, the spellchecker will have to search for words that are similar to the one it fails to recognize.
几乎在每种情况下都会出现以某种形式进行的搜索。...好的搜索算法可以显著提高速度。
Searching in one form or another appears in almost every context. . . . A good search algorithm can result in dramatic improvements in speed.
随着数据量的增加,在海量数据中高效搜索的能力变得越来越重要。我们将会看到,如果数据是有序的,搜索的扩展性将非常出色。在第一章中,我们提到大约 30 次探测就能在十亿个有序数据中找到目标;现在我们将看看如何实际做到这一点。
As the data volumes increase, the ability to search efficiently in a huge number of items has become more and more significant. We’ll see that if our items are ordered, the search can scale extremely well. In chapter 1 we stated that it is possible to find something among a billion sorted items in about 30 probes; now we will see how this can be actually done.
最后,搜索算法将让我们看到当我们从算法转向计算机程序的实际实现时潜伏的危险,而计算机程序必须在特定机器的范围内运行。
Finally, a search algorithm will give us a glimpse of the dangers that lurk when we move from an algorithm to an actual implementation in a computer program, which has to run within the confines of a particular machine.
最简单的搜索方法就像我们俗话说的大海捞针。如果我们想在一组对象中找到某个东西,而它们之间完全没有结构,那么我们唯一能做的就是一个接一个地检查,直到找到我们要找的东西,或者在用尽所有东西之后仍然找不到为止。
The simplest way to search is what we do to find the proverbial needle in a haystack. If we want to find something in a group of objects and there is absolutely no structure in them, then the only thing we can do is to check one item after the other until we either find the item we are looking for or fail to find it after exhausting all items.
如果你有一副牌,并且正在寻找其中的某一张,你可以从牌堆顶部开始取牌,直到找到你要找的那张或牌用完为止。或者,你也可以从牌堆底部开始一张一张地取牌。你甚至可以从牌堆中的随机位置取牌。原理是一样的。
If you have a deck of cards and are looking for a particular one in them, you can start taking off the cards from the top of the deck until you find the one you are looking for or run out of cards. Alternatively, you can start taking off the cards one by one from the bottom of the deck. You can even take off cards from random positions in the deck. The principle is the same.
通常我们不处理计算机中的物理对象,而是处理它们的数字表示。在计算机上表示数据组的一种常见方式是采用列表的形式。列表是一种数据结构,它以这样的方式包含一组事物,即我们可以从一个项目找到下一个项目。我们通常可以将列表想象为包含链接的项目,其中一个项目指向下一个项目,直到最后,最后一项指向任何内容。这个比喻与事实相差不远,因为在计算机内部,使用内存位置来存储项目。在链接列表中,每个项目包含两样东西:它的有效载荷数据以及列表中下一个项目的内存位置。内存中保存另一个位置的内存位置的位置称为指针。因此,在链表中,每个元素都包含指向下一个元素的指针。列表的第一个项目称为其头。列表中的项目也称为节点。最后一个节点不指向任何地方;我们说它指向null:计算机上的虚无。
Usually we do not deal with physical objects in computers but rather digital representations of them. A common way to represent groups of data on a computer is in the form of a list. A list is a data structure that contains a group of things in such a way that from one item we can find the next one. We can usually think of the list as containing linked items, where one item points to the next one, until the end, where the last item points to nothing. The metaphor is not far from the truth because internally the computer uses memory locations to store items. In a linked list, each item contains two things: its payload data and the memory location of the next item on the list. A place in memory that holds the memory location of another place in memory is called a pointer. Therefore in a linked list, each element contains a pointer to the next element. The first item of a list is called its head. The items in a list are also called nodes. The last node does not point to anywhere; we say that it points to null: nothingness on a computer.
列表是一系列元素的序列,但该序列不一定按照某些特定标准排序。例如,以下列表包含字母表中的一些字母:
A list is a sequence of items, but it is not necessary that the sequence is ordered using some specific criterion. For example, the following is a list containing some letters from the alphabet:
如果我们有一个无序列表,则查找其中的项目的算法如下:
If we have an unordered list, the algorithm for finding an item on it goes like this:
这被称为线性搜索或顺序搜索。它本身并没有什么特别之处;它只是对“依次检查每个元素,直到找到我们想要的那个”这一想法的直接实现。实际上,该算法会让计算机从一个指针跳到另一个指针,直到找到我们要找的元素或为空。下面我们展示了搜索 E 或 X 时发生的情况:
This is called a linear or sequential search. There is nothing special about it; it is a straightforward implementation of the idea of examining each single thing in turn until we find the one we want. In reality, the algorithm makes the computer jump from pointer to pointer until it either reaches the item we are looking for or null. Below we show what is happening when we search for E or X:
如果我们在n 个项目中搜索,最好的情况是立即找到我们想要的项目,如果它位于列表的头部,则会发生这种情况。最糟糕的情况是该项目位于列表的末尾,或者根本不在列表中。那么我们必须遍历所有n 个项目。因此,顺序搜索的性能是。
If we search among n items, the best thing that can happen is to hit on the item we want immediately, which will occur if it is the head of the list. The worst thing that can happen is that the item is the last one on the list or not on the list at all. Then we must go through all n items. Therefore the performance of sequential search is .
如果列表中物品的顺序是随机的,我们就无法提高时间效率。回到一副牌的例子,你就能明白为什么会这样:如果牌组洗得正确,我们就无法提前知道牌会在哪里找到。
There is nothing we can do to improve on that time if the items appear on the list in a random sequence. Going back to a deck of cards, you can see why this is so: if the deck is properly shuffled, there is no way to know in advance where we’ll find our card.
有时人们会遇到麻烦。如果我们在一大堆纸里找一张,我们可能会厌倦去寻找。一张接一张。我们甚至会想,如果这张纸放在了最底下,那我们该有多倒霉!所以我们不再按顺序翻阅,而是偷看最底下。偷看最底下并没有错,但认为这样能提高快速完成搜索的几率就错了。如果这堆纸是随机的,那么我们想要的东西没有理由不在最上面、最下面,或者最中间。任何位置出现的可能性都一样大,所以从最上面开始,一直翻到最底下,和其他确保我们只查看一次每件物品的策略一样好。然而,如果我们按照特定的顺序搜索,通常比漫无目的地跳来跳去更容易记住我们看过的东西,这就是为什么我们更喜欢坚持顺序搜索。
Sometimes people have trouble with that. If we are looking for a paper among a large pile, we may tire of going one after the other. We may even think of how unlucky we would be should the paper turn out to be at the bottom of the pile! So we stop going through the pile in order and peek at the bottom. There is nothing wrong in peeking at the bottom, but it’s wrong to think that this improves our chances of finishing the search quickly. If the pile is random, then there is no reason why the sought-after item is not the first, last, or one right in the middle. Any position is equally likely, so starting from the top and making our way to the bottom of the pile is as good a strategy as any other that ensures we examine each item exactly once. It is usually simpler to keep track of what we looked at if we work in a specific order, however, than jumping around erratically, and that’s why we prefer to stick with a sequential search.
只要没有理由怀疑搜索项位于特定位置,上述方法就成立。但如果情况并非如此,情况就会发生变化,我们可以利用任何额外的信息来加快搜索速度。
All this holds as long as there is no reason to suspect that the search item is in a particular position. But if this is not true, then things change, and we can take advantage of any extra information we may have to speed up our search.
你可能注意到,在凌乱的书桌上,有些东西会堆到最上面,而有些东西则会滑到最下面。当你终于把混乱之中,作者曾有过一段愉快的经历:在一堆深埋的东西里,发现了一些他以为早已丢失的东西。其他人可能也经历过这样的经历。我们倾向于把常用的东西放在身边;而那些不常用的东西,却越放越远,伸手可及。
You may have noticed that in an untidy desk, some things find their way to the top of the pile, while some others seem to slip to the bottom. When finally cleaning up the mess, the author has had the pleasant experience of discovering buried deep down in a heap things he believed were long lost. The experience has probably occurred to others as well. We tend to place things we use frequently close; things we have little use for slip further and further out of reach.
假设我们有一堆文件需要处理。这些文件没有任何顺序。我们翻阅着,找到我们需要的文件,处理完后,就把它放到了最上面,而不是原来的位置。然后我们继续处理其他事情。
Suppose we have a pile of documents on which we need to work. The documents are not ordered in any way. We go through the pile, searching for the document we need, processing it, and then placing it not where we found it but instead on the top of the pile. Then we go again with our business.
我们处理所有文档的频率可能并不一致。有些文档我们可能会反复查看,而有些文档我们可能很少访问。如果我们在处理完每份文档后都将其放在最上面,一段时间后我们会发现,最常用的文档会靠近顶部,而访问频率最低的文档则会移到底部。这对我们来说很方便,因为我们可以减少查找常用文档的时间,从而减少总体时间。
It may happen that we do not work with the same frequency on all documents. We may return to some of them again and again, while we may only rarely visit others. If we continue placing every document on the top of the pile after working on it, after some time we’ll find out that the most popular documents will be near the top, while the ones we accessed the least often will have moved toward the bottom. This is convenient for us because we spend less time locating the frequently used documents and thus less time overall.
这体现了一种通用的搜索策略:我们会反复搜索相同的商品,有些商品比其他商品更受欢迎。找到一件商品后,就把它放在前面,这样下次查找时就能更快地找到它。
This suggests a general searching strategy, where we search for the same items repeatedly, and some items are more popular than others. After finding an item, bring it forward so that we’ll be able to find it faster the next time we will look for it.
这种策略的适用性如何?这取决于我们观察到这种受欢迎程度差异的频率。事实证明,这种情况经常发生。我们都知道“富人越来越富,穷人越来越穷”这句话。这不仅仅关乎富人和穷人。同一件事在不同活动领域以令人眼花缭乱的方式出现。这种现象有一个名字,叫做马太效应,取自《马太福音》(25:29)中的以下经文:“因为凡有的,还要加给他,叫他有余;没有的,连他所有的也要夺过来。”
How applicable would such a strategy be? It depends on how often we observe such differences in popularity. It turns out that they happen a lot. We know the saying “the rich get richer, and the poor get poorer.” It is not just about rich and poor people. The same thing appears to a bewildering array of aspects in different fields of activity. The phenomenon has a name, the Matthew effect, after the following verse in the Gospel of Matthew (25:29): “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath.”
这节经文谈论的是物质财富,那么我们不妨先来思考一下财富。假设你有一个大型体育场,可容纳八万人。你测量了体育场内所有人的平均身高,结果可能是1.70米(5英尺7英寸)左右。想象一下,你随机从体育场中抽出一个人,然后把世界上最高的人放进去。平均身高会有所不同吗?即使最高的人身高3米(历史上从未有过这样的身高记录),平均身高仍会保持原来的值——与之前的平均值的差异不到十分之一毫米。
The verse talks about material goods, so let’s think about wealth for a minute. Suppose you have a large stadium, capable of holding 80,000 people. You are able to measure the average height of the people in the stadium. Your result may be something around 1.70 meters (5 feet, 7 inches). Imagine that you take out somebody randomly from the stadium and put in the tallest person in the world. Will the average height differ? Even if the tallest person is 3 meters tall (no such height has ever been recorded), the average height would remain stuck at its previous value—the difference with the previous average being less than a tenth of a millimeter.
现在想象一下,你测量的不是平均身高,而是平均财富。你的8万人的平均财富可能是100万美元(我们假设这是一个富裕的群体)。现在你又换成某人和世界首富一起。这个人的财富可能高达1000亿美元。这会有什么不同吗?是的,会的,而且是巨大的不同。平均财富将从100万美元增加到2249987.5美元,增长一倍以上。我们知道,世界各地的财富分配并不均等,但我们可能没有意识到这种分配有多么不均衡。这种不均衡远比身高等自然指标的分配更为严重。
Imagine now that instead of measuring the average height, you measure the average wealth. The average wealth of your 80,000 people could be $1 million (we are assuming a wealthy cohort). Now you substitute again somebody inside with the richest person in the world. That person could have a wealth of $100 billion. Would this make a difference? Yes, it would—and a big one. The average would increase from $1 million to $2,249,987.5, or more than double. We are aware that wealth is not distributed equally around the world, but we may not be aware of how unequal the distribution is. It is much more unequal than a distribution of natural measures like height.
同样的禀赋差异也存在于许多其他环境中。有很多演员你闻所未闻。也有一些明星出演过多部电影,收入数百万美元。“马太效应”一词由社会学家罗伯特·K·默顿于1968年提出,他观察到,即使贡献相似,著名科学家也会比不太知名的同事获得更多的赞誉。科学家越出名,他们就越出名。
The same difference in endowments occurs in many other settings. There are many actors you have never heard of. And there are a few stars who have appeared in many movies, earning millions of dollars. The term “Matthew effect” was coined by the sociologist Robert K. Merton in 1968, when he observed that famous scientists get more credit for their work over their lesser-known colleagues, even if their contributions are similar. The more famous scientists are, the more famous they will get.
一种语言中的词汇遵循着同样的模式:有些词汇比其他词汇更受欢迎。这种显著的不平等现象体现在诸多领域,包括城市规模(特大城市比普通城市大很多倍),以及网站的数量、链接和受欢迎程度(大多数网站只有偶尔的访客,而有些网站则能带来数百万的流量)。这种不平等分配的普遍性,即少数群体获得不成比例的资源,在过去一直是备受关注的研究领域。几年来,研究人员正在研究此类现象出现的原因和规律。1
Words in a language follow the same pattern: some of them are much more popular than others. The list of domains that are characterized by such jarring inequalities includes the size of cities (megacities are many times larger than the average city) and number, links, and popularity of web sites (most sites are honored only by the occasional visitor, while others rake in millions). The prevalence of such unequal distributions, where a few elements of a population obtain a disproportionate amount of resources, has been a rich field of inquiry over the last few years. Researchers are looking into the reasons and laws that underlie the emergence of such phenomena.1
我们搜索的项目可能存在这样的流行度差异。那么,一个利用搜索项目不同流行度的搜索算法,其工作原理类似于将我们找到的每个文档放在最上面:
It is possible that the items in which we are searching exhibit such differences in popularity. Then a search algorithm that will take advantage of the varying popularity of the search items can work much like putting each document that we find at the top of the pile:
下图中,在列表中找到 E 会将其置于最前面:
In the following figure, finding E on the list will bring it to the front:
这种移至前端算法的一个可能的缺点是,它甚至会将我们很少搜索的项目提升到前端。这没错,但如果该项目没有流行,当我们搜索其他项目时,它会逐渐移到列表末尾,因为这些项目会移到前面。不过,我们可以采取一种不那么极端的策略来处理这种情况。与其将找到的每个项目都移到前面,不如将其向前移动一位。这被称为转置法:
A possible criticism of this move-to-front algorithm is that it will promote to the front even an item that we only rarely search for. That is true, but if the item is not popular, it will gradually move toward the end of the list as we search for other items because these items will move to the front. We can take care of the situation, however, by adopting a less extreme strategy. Instead of moving each item we find bang to the front, we can move it just one position forward. This is called the transposition method:
这样,受欢迎的商品就会逐渐移到前面,而不太受欢迎的商品则会移到后面,而不会出现突然的动荡。
In this way, items that are popular will gradually make their way to the front, and less popular items will move to the back, without sudden upheavals.
移至前端和转置方法都是自组织搜索的例子;其名称来源于因为项目列表是随着我们的搜索而组织的,并且会反映所搜索项目的受欢迎程度。根据项目受欢迎程度的范围,节省的成本可能非常可观。虽然顺序搜索的预期性能为,但使用移至前端方法的自组织搜索可以达到的性能为
。如果我们有大约一百万个项目,那么这就是一百万和五万个之间的差异。转置方法可以获得更好的结果,但需要更多时间才能实现。这是因为这两种方法都需要一个“预热期”,在此期间热门项目会自行出现并移到前面。在移至前端方法中,预热时间很短;在转置方法中,预热时间需要更长,但我们会得到更好的结果。2
Both the move-to-front and transposition methods are examples of a self-organizing search; the name comes because the list of items is organized as we go with our searches and will reflect the popularity of the searched items. Depending on how the popularity ranges among items, the savings can be significant. While with a sequential search we can expect a performance of , a self-organizing search with the move-to-front method can attain a performance of . If we have about a million items, this is the difference between 1 million and about 50,000. The transposition method can have even better results, but it requires more time to achieve them. That’s because both methods require a “warm-up period” in which popular items will show themselves up and make their way to the front. In the move-to-front method, the warm-up is short; in the transposition method, the warm-up takes longer, but then we get better results.2
著名天文学家约翰尼斯·开普勒(1571-1630)于1611年因霍乱去世后,便着手再婚。他为人处事严谨,从不听天由命。在一封写给斯特拉伦多夫男爵的长信中,他描述了自己选择的流程。他计划面试11位可能的新娘,然后再做决定。第五位候选人对他很有吸引力,但他的朋友反对她身份卑微,最终说服了他。他们建议他重新考虑第四位候选人。但她拒绝了他。最终,在考察了所有11位候选人之后,开普勒最终娶了第五位:24岁的苏珊娜·罗伊廷格。
After the celebrated astronomer Johannes Kepler (1571–1630) lost his wife to cholera in 1611, he set out to remarry. A methodical man, he did not leave things to chance. In a long letter to a Baron Strahlendorf, he describes the process he followed. He planned to interview 11 possible brides before making his decision. He was strongly attracted to the fifth candidate, but was swayed against her by his friends, who objected to her lowly status. They advised him to reconsider the fourth candidate instead. But then he was turned down by her. In the end, after examining all 11 candidates, Kepler did marry the fifth one: 24-year-old Susanna Reuttinger.
这个小故事是一个延伸的搜索例子;开普勒在众多可能的候选者中寻找理想的匹配对象。然而,这个过程存在一个他一开始可能并未意识到的缺陷:一旦他拒绝了某个可能的匹配对象,就可能无法再回到原来的匹配对象。
This little story is a stretched example of a search; Kepler was searching for an ideal match, among a pool of possible candidates. Yet there was a kink in the process that he was probably not aware of when he started: it might not be possible to go back to a possible match after he had rejected it.
我们可以用更现代的术语重新表述这个问题,即寻找最佳方法来决定购买哪辆车。我们事先决定要去一定数量的汽车经销店。而且,我们的自尊心不允许我们在离开后再回到一家汽车经销店。如果我们拒绝了某辆车,面子至关重要,这样我们就不能回去说我们改变了主意。或者,也许在我们离开后,其他人走进来买了这辆车。无论如何,我们都必须在每家经销店做出最终决定,是买下这辆车,还是放弃,不再回来。
We can recast the problem in more contemporary terms, as looking for the best way to decide which car to buy. We have decided beforehand that we will visit a certain number of car dealerships. Also, our amour propre will not allow us to return to a car dealership after we have walked away from it. If we have declined a car, saving face is paramount, so that we cannot go back and say that we changed our mind. Or perhaps somebody else walked in and bought the car after we left. Be it as it may, we have to make a final decision at each dealership, to buy the car or let go, and not come back.
这是一个最优停止问题的例子。我们必须采取行动,同时努力最大化回报或最小化成本。在我们的例子中,我们想要决定购买一辆汽车,并且这个决定最终能买到最好的车。如果我们决定得太早,我们可能会买到一辆比我们还没见过的车更差的车。如果我们决定购买一辆比我们还没见过的车更差的车。如果我们决定购买一辆比我们还没见过的车更差的车,那么最终的结果是,我们最终会买 ...太晚了,我们可能会懊恼地发现,我们竟然看到了最好的车,却错过了。什么时候才是停下来做决定的最佳时机?
This is an instance of an optimal stopping problem. We have to take an action, while trying to maximize a reward or minimize a cost. In our example, we want to decide to buy the car, when this decision will result in the best car we can buy. If we decide too early, we may settle on a car that is worse than a car we have not seen yet. If we decide too late, we may discover to our chagrin that we saw, but missed, the best car. When is the optimal time to stop and make a decision?
同样的问题,通常被描述得更冷酷无情,就像秘书问题一样。你想从众多候选人中挑选一位秘书。你可以逐一面试候选人,但每次面试结束后,你都必须决定是否聘用。如果你拒绝了一位候选人,之后就不能改变主意再录用(这位候选人可能太优秀了,会被其他人抢走)。那么,你该如何挑选这位候选人呢?
The same issue is usually described in a more callous way as the secretary problem. You want to select a secretary from a pool of candidates. You can interview the candidates one by one. You must make a decision to hire or not at the end of each interview, however. If you reject a candidate, you cannot later change your mind and make an offer (the candidate might be too good and thus be snapped by somebody else). How will you pick the candidate?
答案出奇地简单。你筛选前 37% 的候选集,全部淘汰,但保留其中最好的一个作为基准。37 这个数字看起来很神奇,因为它的出现是因为,其中e是欧拉常数,约等于 2.7182(我们在第一章中见过欧拉常数)。然后你筛选剩余的候选集。你选择的第一个比基准集更优的候选集作为基准集。用算法的形式来表达,假设你有n 个候选集:
There is a surprisingly simple answer. You go through the first 37 percent of the candidates, rejecting them all, but keeping a tab on the best one among them as your benchmark. The number 37, which seems magical, occurs because , where e is Euler’s number, approximately equal to 2.7182 (we saw Euler’s number in chapter 1). Then you go through the rest of the candidates. You stop at the first of the rest that is better than your benchmark. That will be your pick. In algorithmic form, if you have n candidates:
算法并非总能找到最佳候选;毕竟,总体而言,最佳候选可能是你在前 37% 的案例中确定的基准候选,而你最终将其排除。可以证明,它能在 37%(同样,)的所有案例中找到最佳候选;而且,没有其他方法能够在更多案例中找到最佳候选。换句话说,算法是你所能做到的最好的:尽管它可能在 63% 的案例中无法给出最佳候选,但你选择遵循的任何其他策略在更多案例中都会失败。
The algorithm will not always find the best candidate; after all, the best candidate overall may be the benchmark candidate you identified in the first 37 percent, and that you have rejected. It can be proved that it will find the best candidate in 37 percent (again, ) of all cases; moreover, there is no other method that will manage to find the best candidate in more cases. In other words, the algorithm is the best you can do: although it may fail to give you the best candidate in 63 percent of the cases, any other strategy you may decide to follow will fail in more cases than that.
回到汽车的话题,假设我们决定去10家汽车经销商。我们应该去前四家,并记下这四家给出的最佳报价,但不要购买。然后,我们开始去剩下的六家经销商,并选择第一家报价比我们记下的报价更优的经销商购买(之后我们跳过剩下的)。我们可能会发现,这六家经销商给出的报价都比我们之前去的四家(但没买)的报价更差。但没有其他策略能让我们更有可能获得最佳交易。
Going back to cars, suppose we decide to visit 10 car dealerships. We should visit the first four and take note of the best offer by these four, without buying. Then we start visiting the remaining six dealerships and we’ll buy from the first dealership that gives us an offer better than the one we noted down (we’ll then skip the rest). We may discover that all six dealerships make worse offers then the first four that we visited without buying. But no other strategy can give us better odds of getting the best deal.
我们假设我们想要找到最好的候选人,并且不会满足于更低的条件。但如果我们实际上可以接受更低的条件呢?这意味着即使理想情况下我们想要最好的秘书或汽车,我们也可以接受其他选择,尽管我们可能不太满意,但即使选择了最好的,也不会那么满意。如果我们这样构建问题,那么进行选择的最佳方法是使用与上述相同的算法,但检查并舍弃候选者的平方根 。如果我们这样做,我们做出最佳选择的概率会随着候选者数量的增加而增加:随着n 的增加,我们选择最佳的概率会趋近于 1(即 100%)。3
We have assumed that we want to find the best possible candidate and will settle for nothing less. But what if we can in fact settle for something less? That means that even though ideally we would want the best secretary or car, we can make do with another choice, with which we may be happy, although not as happy had we picked up the best. If we frame the problem like that, then the best way to make our selection is to use the same algorithm as above, but examining and discarding the square root, , of the candidates. If we do that, the probability that we will make the best choice increases with the number of candidates: as n increases, the probability that we’ll pick the best goes to 1 (that is, 100 percent).3
我们考虑了针对不同场景的不同搜索方法。所有这些方法的共同点在于,我们检查的项目并非按照特定顺序给出;在自组织搜索中,我们最多会根据热门程度逐渐排序。但如果项目一开始就已排序,情况就会完全不同。
We have considered different ways to search, corresponding to different scenarios. A common thread in all these was that the items that we examine are not given to us in any specific order; at best, we order them gradually by popularity in a self-organized search. The situation changes completely if the items are ordered in the first place.
假设我们有一堆文件夹,每个文件夹都用一个数字标识。这些文档根据其标识符从小到大排序(数字不必连续)。如果我们有这样一堆文档,并且正在寻找具有特定标识符的文档,那么从第一个文档开始一直到最后一个文档,直到我们找到了我们要找的那份。更好的策略是直接找到这堆文件的中间部分。然后,我们将中间文件上的编号标识符与我们要找的文件编号进行比较。结果可能有三种:
Let’s say we have a pile of folders, each one of which is identified by a number. The documents in the pile are ordered according to their identifier, from the lowest to highest number (there is no need for the numbers to be consecutive). If we have such a pile and are looking for a document with a particular identifier, it is foolish to start from the first document and make our way to the last until we find the one we are looking for. A much better strategy is to go straight to the middle of the pile. Then we compare the number identifier on the document in the middle to the number of the document that we are looking for. There are three possible outcomes:
无论最后两种结果如何,我们现在剩下的文件堆最多只有原来文件的一半。如果我们从奇数个文档开始,比如n 个,那么将n 个文档从中间拆分成两部分,每部分都有项目(在拆分过程中丢弃小数部分):
In either of the last two outcomes, we are now left with a pile that is at most half the original one. If we start with an odd number of documents, say n, splitting n documents in the middle gives us two parts, each with items (discarding the fractional part in the division):
如果项目数量为偶数,则拆分它们将得到两部分,一部分包含项目,另一部分包含
项目:
With an even number of items, splitting them will give us two parts, one with items and another one with items:
我们仍然没有找到想要的东西,但比以前好多了;现在要检查的项目少了很多。我们确实找到了。我们检查了剩余项目的中间文档,然后重复了这个过程。
We have still not found what we were looking for, but we are much better than before; we have much fewer items to go through now. And so we do. We check the middle document of the remaining items and repeat the procedure.
在下一页的图中,您可以看到针对 16 个项目的流程如何展开,其中我们正在寻找项目 135。我们用灰色标出我们搜索的边界和中间项目。
In the figure on the following page, you can see how the process evolves for 16 items, among which we are looking for item 135. We mark out the boundaries inside which we search and the middle item with gray.
一开始,我们的搜索范围是所有项目的集合。我们找到中间的项目,发现它是 384。它大于 135,所以我们丢弃它,以及它右边的所有项目。我们取剩余项目的中间值,结果是 72。它小于 135,所以我们丢弃它,以及它左边的所有项目。我们的搜索范围缩小到只有三个项目。我们取中间的一个,发现它就是我们想要的。我们只用了三次探测就完成了搜索,甚至不需要检查 16 个项目中的 13 个。
In the beginning, the domain of our search is the full set of items. We go to the middle item, which we find out is 384. This is bigger than 135, so we discard it, along with all the items to its right. We take the middle of the remaining items, which turns out to be 72. This is smaller than 135, so we discard it, along with all the items on its left. Our search domain has shrunk to just three items. We take the middle one and find that it is the one we want. It took us only three probes to finish our search, and we did not even need to check 13 of the 16 items.
如果我们要查找不存在的条目,这个过程同样有效。您可以在下图中看到,我们在相同的条目中搜索标签为 520 的条目。
The process will also work if we are looking for something that does not exist. You can see that in the next figure, where we are searching among the same items for one labeled 520.
这次,520 大于 384,所以我们将搜索范围限制在项目的右半部分。我们发现上半部分中间的值为 613,大于 520。然后,我们将搜索范围限制为三个项目,其中中间的是 507。这小于我们的目标值 520。我们将其丢弃,现在只剩下一个项目需要检查,但我们发现它不是我们想要的。因此,我们可以结束搜索并报告搜索失败。我们只进行了四次探测。
This time, 520 is greater than 384, so we restrict our search to the right half of the items. There we find that the middle of the upper half is 613, greater than 520. Then we limit our search to just three items, the middle of which is 507. This is smaller than our target of 520. We discard it and now are left with only one item to check, which we discover is not the one we want. So we can finish our search reporting that it was unsuccessful. It took us only four probes.
我们所描述的方法被称为二分查找,因为每次我们将搜索的值域减半。我们将执行搜索的值域称为搜索空间。基于这个概念,我们可以将二分查找简化为包含以下步骤的算法:
The method we described is called binary search because each time we cut in half the domain of values in which we search. We call the domain of values where we perform our search the search space. Using this concept, we can render the binary search as an algorithm comprising these steps:
这样,我们将要搜索的元素除以二。这是一种分而治之的方法。它会导致重复除法,正如我们在第一章中看到的,这会产生对数。重复除以二会得到以二为底的对数。在最坏的情况下,二分查找会不断地重复除法,直到无法再除为止,就像我们在失败的查找示例中看到的那样。对于n 个元素,这种情况不会发生超过次;因此,二分查找的复杂度为
。
In this way, we divide by two the items that we have to search. This is a divide-and-conquer method. It results in repeated division, which as we have seen in chapter 1 gives us the logarithm. Repeated division by two gives us the logarithm base two. In the worst case, a binary search will keep dividing and dividing our items, until it cannot divide any further, like we saw in the unsuccessful search example. For n items, this cannot happen more than times; it follows that the complexity of a binary search is .
与顺序搜索,甚至是自组织搜索相比,这种改进令人印象深刻。在一百万个项目中搜索,只需 20 次探测即可。从另一个角度来看,只需 100 次探测,我们就能在 中搜索并找到任何项目,这比100 亿个项目要多得多。
The improvement compared to a sequential search, even a self-organized search, is impressive. It will not take more than 20 probes to search among a million items. Viewed from another angle, with a hundred probes we are able to search and find any item among , which is more than one nonillion.
二分查找的效率令人惊叹。它的效率或许只有它的声名远扬才能与之匹敌。它是一种直观的算法。但事实证明,这种简单的方法在计算机程序中很难正确实现。原因并非二分查找算法本身,而是我们将算法转化为实际的编程语言代码的方式,导致程序员们深受其实现中潜藏的隐蔽漏洞的困扰。而且,我们说的可不是新手;即使是世界级的程序员也未能做到完美。4
The efficiency of a binary search is astounding. Its efficiency is probably only matched by its notoriety. It is an intuitive algorithm. But this plain method has proved time and again tricky to get right in a computer program. For reasons that have nothing to do with the binary search algorithm per se, but rather the way we turn algorithms into real computer code in programming language, programmers have been prey to insidious bugs that have crept into their implementations. And we are not talking about rookies; even world-class programmers have failed to get it right.4
为了了解此类错误可能潜伏在哪里,请考虑在算法的第一步中,我们如何在要搜索的项中找到中间元素。这里有一个简单的想法:第 m个元素和第 n个元素的中间元素是,如果结果不是自然数,则四舍五入。这是正确的,并且它遵循初等数学,因此它适用于任何地方。
To get an idea of where such bugs may lurk, consider how we find the middle element among the items we want to search in the first step of the algorithm. Here is a simple idea: the middle element of the mth and nth elements is , rounded if the result is not a natural number. This is true, and it follows from elementary mathematics, so it applies everywhere.
但计算机除外。计算机的资源有限,内存就是其中之一。因此,我们不可能在计算机上表示所有我们想要的数字。有些数字太大了。如果计算机对其可处理的数字大小有上限,那么m和n都应该低于该上限。当然,低于该上限。但要计算
,我们必须计算
然后除以二,而这个和可能大于上限!这称为溢出:超出允许值的范围。因此,你会遇到一个你从未想过会困扰你的错误。结果将不是中间值,而是完全不同的东西。
Except in computers. Computers have limited resources, memory among them. It is not possible, therefore, to represent all the numbers we want on a computer. Some numbers will simply be too big. If the computer has an upper limit on the size of the numbers that it can handle, then both m and n should be below that limit. Of course, is below that limit. But to calculate , we have to calculate and then divide it by two, and that sum may be larger than the upper limit! This is called overflow: going beyond the range of allowable values. So you get a bug that you had never thought would bite you. The result will not be the middle value but instead something else entirely.
如果你发现自己苦苦思索一行代码,却发现它并没有按照你预期的方式运行,请不要灰心。你并非个例。每个人都会遇到这种情况;即使是最优秀的人也会遇到。
Do not despair if you find yourself wretched poring over a line of code that does not do what you think it should do. You are not unique. It happens to all; it happens to the best.
一旦你明白了这一点,解决方案就很简单了。你不用把中间的 计算为 ,而是
。结果相同,但没有溢出。回想起来,这似乎很简单。然而,事后看来,每个人都是先知。
Once you know about it, the solution is straightforward. You do not calculate the middle as but rather . The result is the same, but no overflow occurs. In retrospect it seems simple. In hindsight, though, everybody is a prophet.
我们感兴趣的是算法,而不是编程,但作者想给那些编写或想要编写计算机程序的人一些建议。如果你发现自己苦苦思索一行代码,却发现它并没有按照你的想法运行,不要灰心丧气。如果第二天你意识到,这个 bug 确实一直就在你眼前,也不要沮丧。你怎么可能没发现呢?你并非个例。每个人都会遇到这种情况;即使是最优秀的人也会遇到。
We are interested in algorithms, not programming, here, but let the author share a bit of advice for those who write or want to write computer programs. Do not despair if you find yourself wretched poring over a line of code that does not do what you think it should do. Do not be dismayed if the following day you realize that, indeed, the bug was before your eyes all the time. How could you have failed to see it? You are not unique. It happens to all; it happens to the best.
二分查找要求元素必须经过排序。因此,为了充分利用它的优势,我们应该能够高效地对元素进行排序——这使我们能够进入下一章,学习如何使用算法对元素进行排序。
Binary search requires that the items should be sorted. So to reap its benefits, we should be able to sort items efficiently—which allows us to segue to the next chapter, where we’ll see how we can sort things with algorithms.
美国宪法规定,应每十年进行一次人口普查,以便在联邦各州之间分配税收和代表席位。美国独立战争后的第一次人口普查于1790年进行,此后每十年进行一次。
The US Constitution postulates that a decennial census should take place in order to apportion taxes and representatives among the several states of the union. The first census following the American Revolution took place in 1790, and a census has been done every ten years since.
自1790年以来的一百年间,美国人口迅速增长——从第一次人口普查时略低于400万,到1880年已超过5000万。然而,问题就在于此:统计这些人口花了八年时间。到了1890年,也就是下一次人口普查时,人口数量甚至更多。如果以同样的方式进行统计,很可能要到1900年人口普查才能完成。
In the hundred years since 1790, the United States grew rapidly—from a bit less than 4 million people in the first census, to more than 50 million in 1880. And therein lay a problem: it took eight years to count these people. When the next census year came, in 1890, the population was even bigger. If the count were taken in the same way, it would probably not have been completed before the following census of 1900.
当时,哥伦比亚大学矿业学院的一名年轻毕业生赫尔曼·霍勒里斯(Herman Hollerith,1879 年毕业,时年 19 岁)正在为美国人口普查局。他意识到时间紧迫,试图找到一种利用机器加快人口普查进程的方法。霍勒瑞斯受到列车员在火车票上打孔记录旅客信息的方式的启发,发明了一种使用打孔卡片记录人口普查信息的方法。这些卡片随后可以通过制表机(一种机电设备)进行处理,制表机可以读取打孔卡片,并使用其中存储的数据进行统计。
At that time, Herman Hollerith, a young graduate from Columbia University’s School of Mines (he graduated in 1879, when he was 19), was working for the US Census Bureau. Aware of the pressing timing problem, he tried to find a way to speed up the census process using machines. Hollerith was inspired by the way conductors used holes punched in railway tickets to record traveler details; he invented a way in which punched cards could be used to record census details. These cards could then be processed by tabulating machines, electromechanical devices that could read the punched cards and use the data stored in them to make a tally.
霍勒瑞斯的制表机被用于1890年的人口普查,将完成人口普查所需的时间缩短至六年——当时美国人口已增长至约6300万人。霍勒瑞斯向皇家统计学会展示了他的制表机,并指出“绝不能认为该系统仍处于试验阶段。超过1亿张穿孔卡片已在这些机器上进行了多次计数,这为其能力的测试提供了充足的机会。” 1人口普查结束后,霍勒瑞斯创办了一家名为霍勒瑞斯电动制表系统(Hollerith Electric Tabulating System)的公司。这家公司经过一系列更名和合并,于1924年发展成为国际商业机器公司(IBM)。
Hollerith’s tabulating machine was used in the 1890 census and brought down the time required to complete it to six years—when it came out that the US population had grown to approximately 63 million people. Hollerith presented his tabulating machines to the Royal Statistical Society, noting that “it must not be considered that this system is still in an experimental stage. Over 100,000,000 punched cards have been counted several times over on these machines, and this has afforded ample opportunity to test its capabilities.”1 Following the census, Hollerith started a business, called the Hollerith Electric Tabulating System. This company, via a series of renames and amalgamations, evolved into International Business Machines (IBM) in 1924.
如今,分类管理已无处不在,几乎已经不为人知。就在几十年前,办公室里摆满了文件柜,里面放着贴有标签的文件夹,公司办公室人员会小心地按要求的顺序排列它们,比如按字母顺序或按时间顺序排列。相比之下,我们只需点击一下,就能对邮箱中的邮件进行排序,并可以使用不同的标准(例如主题、日期和发件人)进行排序。我们的联系人在电子设备中被有序地保存着,而我们却浑然不知;同样,几年前,我们也会煞费苦心地确保在日记中整理好联系人。
Today sorting is so ubiquitous that is largely invisible. Just a few decades ago, offices were full of file cabinets containing labeled folders, and corporate office personnel took care to keep them in the required order, like alphabetic or chronological. By contrast, we can sort the messages in our mailboxes just by clicking, and are able to order them using different criteria such as subject, date, and sender. Our contacts are kept sorted in our digital devices without us taking notice; again, a few years ago we would take pains to make sure we had our contacts organized in our diaries.
回顾美国人口普查,排序是办公自动化最早的例子之一;因此,它成为数字计算机的最早应用之一也就不足为奇了。人们开发了许多不同的排序算法。其中一些算法在实践中并未得到应用,但仍有许多不同的排序算法受到程序员的欢迎,因为它们各有优缺点。排序是计算机工作中如此基础的一部分,以至于任何关于算法的书籍都会专门讨论它。然而,正因为排序算法种类繁多,对它们的探索才让我们能够理解计算机科学家和程序员工作的一个重要方面。就像工具匠一样,他们拥有一整套工具箱。同一项任务可能需要不同的工具。想想不同类型的螺丝刀。我们有一字螺丝刀、十字螺丝刀、内六角螺丝刀和罗伯逊螺丝刀,仅举几例。虽然它们的用途相同,但特定的螺丝需要特定的螺丝刀。有时我们可以用一字螺丝刀拧十字螺丝;但一般来说,我们必须使用合适的工具来完成工作。排序也是如此。虽然所有排序算法都能将事物排序,但每种算法都有更适合特定用途。
Going back to the US census, sorting was one of the first examples of office automation; it is not surprising, then, that it was one of the first applications of digital computers. A lot of different sorting algorithms have been developed. Some of them are not used in practice, but there are still a number of different sorting algorithms that are popular with programmers because they offer different comparative advantages and disadvantages. Sorting is such a fundamental part of what computers do that any book on algorithms will always devote some part to it, yet exactly because there are many different sorting algorithms, their exploration allows us to appreciate an important aspect of the work of computer scientists and programmers. Like toolsmiths, they have a whole toolbox at their disposal. There may be different tools for the same task. Think of different types of screwdrivers. We have slot, Phillips, Allen, and Robertson drivers, to name but a few. Although all of them have the same objective, particular screws require particular drivers. Sometimes we can make do using a slot driver on a cross screw; in general, though, we must use the proper tool for the job. The same with sorting. While all sorting algorithms put things in order, each is more suitable for particular uses.
在开始探索这些算法之前,我们先来解释一下它们的具体作用。当然,它们会进行排序,但这引出了一个问题:数据排序究竟是什么意思?
Before we start exploring these algorithms, let’s look at some explanations of what exactly these algorithms do. Sure, they sort stuff, but that really begs the question, What exactly do we mean by sorting data?
假设我们有一组相关数据(通常称为记录),其中包含一些我们感兴趣的信息。例如,这些数据可能是我们收件箱中的电子邮件。我们希望重新排列这些数据,使它们按照对我们有用的特定顺序出现。重新排列必须使用数据的某些特定特征。在我们的电子邮件示例中,我们可能希望按发送日期(时间顺序)或发件人姓名(字母顺序)对邮件进行排序。顺序可以是升序(从较早的邮件到较新的邮件),也可以是降序(从最近的邮件回溯)。排序过程的输出必须与输入的数据相同;用技术术语来说,这必须是原始数据的排列,即原始数据的顺序不同,但没有以任何其他方式更改。
We assume that we have a group of related data—usually called records—that contains some information that is of interest to us. For example, such data could be the emails in our in-box. We want to rearrange these data so that they appear in a specific order that is useful to us. The rearrangement has to take place using some specific feature or features of the data. In our email illustration, we may want to order our messages by delivery date, chronologically, or the sender’s name, alphabetically. The order may be ascending, from earlier messages to more recent ones, or descending, from recent messages going back in time. The output of the sorting process must be the same data as the input; in technical terms, this must be a permutation of the original data—that is, the original data in different order, but not changed in any other way.
我们用来对数据进行排序的特征通常称为键。当我们认为无法将其分解为多个部分时,键可以是原子的;当键由多个特征组成时,键可以是复合的。如果我们想按递送日期对电子邮件进行排序,这就是一个原子键。(我们不关心日期是否可以拆分成年、月、日,甚至可能包含确切的发送时间)。但我们可能希望按发件人姓名对电子邮件进行排序,然后对于来自同一发件人的所有邮件,按发送日期进行排序。日期和发件人的组合构成了我们排序的复合键。
The feature we are using to sort our data is usually called a key. A key may be atomic, when we consider that we cannot decompose it to parts, or it may be composite, when the key consists of more than a single feature. If we want to sort our emails by delivery date, this is an atomic key (we do not care that a date can be broken up in year, month, and day, and may also contain the exact time of delivery). But we may want to sort our emails by the sender’s name, and then for all the messages from the same sender, order them by delivery date. The combination of date and sender forms the composite key of our sort.
虽然它们的目标都是一样的,但特定的螺丝需要特定的螺丝刀……排序也是如此。虽然所有排序算法都能将事物排序,但每种算法都有更适合特定用途。
Although all of them have the same objective, particular screws require particular drivers. . . . The same with sorting. While all sorting algorithms put things in order, each is more suitable for particular uses.
任何类型的特征都可以用作排序的键,只要其值可以排序。显然,数字也是如此。如果我们想按每件商品的销售额对销售数据进行排序,销售额就是一个整数。当我们的键是文本时,例如发件人的电子邮件,我们通常希望按字典顺序排列。排序算法需要知道如何比较我们的数据以推断其顺序,但任何有效的比较方法都可以。
Any kind of feature can be used as a key for sorting, as long as its values can be ordered. Obviously this holds true for numbers. If we want to sort sales data by the number of sales per items sold, the number of sales is an integer. When our keys are textual, such as senders’ emails, the ordering that we usually want is lexicographical. Sorting algorithms need to know how to compare our data so as to deduce their order, but any valid way to compare will do.
我们将从两种可能熟悉的算法开始探索排序方法,因为它们可能是最直观的,甚至对算法一无所知的人在对一堆东西进行排序时也会使用它们。
We’ll start our exploration of sorting methods with two algorithms that may be familiar because they are probably the most intuitive and even used by people with no knowledge of algorithms when they have to sort a pile of stuff.
我们的任务是对以下项目进行排序:
Our task is to sort the following items:
诚然,如果你仔细看一下这个任务,就会发现它其实很简单;这些就是从一到十的数字。但保持简单能让我们专注于排序任务的逻辑。
Admittedly, if you take a look at the task, it’s pretty trivial; these are the numbers from one to ten. But keeping things simple will allow us to concentrate on the logic of the sorting task.
首先,我们遍历所有项,找到其中最小的项。我们将它从找到的位置取出并放在第一位。最小项是 1,所以必须把它放在第一位。由于这个位置已经被占用,我们必须对当前位于第一位的 4 进行一些处理;我们不能直接把它扔掉。我们可以做的是将它与最小项交换:将最小项移到第一位,并将之前位于第一位的项移到移动最小项后空出的位置。因此,我们从这里开始,将最小项涂成黑色,
First, we go through all the items and find the minimum among them. We take it from where we found it and place it first. The minimum of the items is 1, so this must be put into the first position. As this position is already taken, we have to do something with 4, which is currently at the first position; we cannot just throw it away. What we can do is to swap it with the minimum: move the minimum item to the first position and move the item previously in the first position to the position left vacant by moving the minimum. So we go from here, where the minimum is painted black,
到这里,
to here,
其中最小值被涂成白色,以表明它处于正确、有序的位置。
where the minimum is painted white, to indicate that it is in its correct, ordered position.
现在,我们对所有数字(除了找到的最小值)执行完全相同的操作,也就是从第二个位置开始的所有数字(灰色数字)。我们找到它们的最小值,即 2,并将其与第一个未排序的数字 6 交换:
Now we do exactly the same thing with all the numbers, save for the minimum we found—that is, all the numbers from the second position onward (the gray numbers). We find their minimum, which is 2, and swap it again with the first of the unsorted numbers, 6:
再次执行相同的操作。我们处理从第三个开始的项;找到最小值 3,并将其与当前第三位的项 10 交换:
Again we do the same. We deal with the items from the third one onward; we find the minimum, which is 3, and swap it with the item currently in the third place, 10:
如果我们继续这样做,项目 4 将保持不变,因为它已经在正确的位置,并且我们将继续将项目 5 放置在其排序位置:
If we continue this way, item 4 will stay put because it is already in its correct place and we’ll go on to place 5 in its sorted position:
每次我们遍历越来越少的元素,直到找到它们的最小值。最后,我们会找到最后两个元素中的最小值,这样所有元素就排序好了。
At each point we go through fewer and fewer items to find their minimum. In the end, we’ll find the minimum of the last two items, and once we’ve done that, all our items will be sorted.
这种排序方法称为选择排序,因为每次我们都会选择未排序项中的最小值并将其放置在它应该在的位置。与我们将要研究的所有排序算法一样,选择排序对平局(即顺序相同的元素)没有问题。如果我们在检查未排序项时发现多个最小值,我们只需从中选取一个作为工作最小值。下次我们会找到平局项并将其放置在与其相等的元素旁边。
This sorting method is called selection sort because each time, we select the minimum of the unsorted items and place it where it should be. As all sorting algorithms that we will examine, selection sort has no problem with ties—that is, elements that have the same order. If we find more than one minimum when we examine the unsorted items, we just pick any one of them as our working minimum. We’ll find the tied item next time around and put it next to its equal.
选择排序是一种简单易懂的算法。它是否也是一种优秀的算法?如果我们仔细观察,就会发现,我们每次都会从待排序元素的首尾遍历,并尝试找出剩余未排序元素中的最小值。如果我们有n 个元素,那么选择排序的复杂度为。这本身并不坏;这种复杂度并不令人望而却步,我们可以在合理的时间内解决大型问题(例如对大量元素进行排序)。
Selection sort is a straightforward algorithm. Is it also a good one? If we pay attention to what we are doing, we are going from the beginning to the end of the items that we want to sort, and each time we try to find the minimum of the remaining unsorted items. If we have n items, the complexity of the selection sort is . This is not bad in itself; such complexity is not prohibitive, and we can tackle large problems (read: sort a lot of items) in a reasonable amount of time.
问题是,正因为排序如此重要,所以确实存在比它更快的算法。所以,虽然选择排序本身并不坏,但当我们手头有大量数据时,我们通常更喜欢使用其他更高级的算法。同时,选择排序并非不仅易于人类理解,而且易于在计算机上高效实现。因此,它显然不仅仅具有学术价值,而且在实践中也得到了切实的应用。
The thing is, exactly because sorting is so important, algorithms do exist that are faster than that. So although selection sort is not inherently bad, we usually prefer to use other, more advanced algorithms when we have a lot of items at hand. At the same time, selection sort is not only easy to understand by humans but is also easy to implement on a computer in an efficient way. So it is clearly not of just academic interest; it is really used in practice.
我们现在要讨论的另一种简单排序算法也同样如此。与选择排序类似,这是一种计算机之外也易于理解的排序方法。事实上,我们在纸牌游戏中可能会用这种方法来排序手牌。
The same can be said for another simple sorting algorithm that we’ll describe now. Like selection sort, this is a sorting method that is easy to understand beyond computers. In fact, this is the way we may sort our hand in a card game.
假设你在玩一种纸牌游戏,手里有十张牌(比如,你可能在玩拉米牌)。当你一张接一张地拿牌时,你想对手中的牌进行排序。我们假设牌的点数从小到大依次为:
Imagine that you play a game of cards in which you are dealt ten cards (for example, you could be playing Rummy). As you take one card after the other, you want to sort them in your hand. We assume that the card rank, from the lowest to highest, is:
事实上,在许多游戏(以及拉米牌)中,A 可以是最低和最高级别的牌,但我们假设只有一个顺序。
In fact, in many games (and Rummy), the ace can be the lowest- and highest-ranking card, but we’ll assume that there is a single order only.
每张牌都会发给你,所以你一开始手里有一张牌,接下来还有九张牌:
You are dealt each card, so you start with one card in your hand and nine cards to follow:
现在你拿到了第二张牌;它是六:
Now you get a second card; it is a six:
六和四相邻,所以你把它留在那里并拿另一张牌,结果是二:
Six is fine next to four, so you leave it there and take another card, which turns out to be two:
这次,为了保持手牌的顺序,你需要将“2”移到“4”的左边,这样“4”和“6”就向右移动一位。在拿到另一张牌“3”之前,你需要这样做:
This time, so as to keep your hand in order, you need to move two to the left of four, thus pushing four and six one position to the right. You do that before you are dealt another card, a three:
你把3放在2和4之间,然后看到下一张牌是9。这张牌已经在你手中正确的位置了。
You insert the three between the two and four, and see the next card, a nine. This is already in the right place in your hand.
您可以继续使用您的手牌 - 例如,7、Q、J、8 和 5。最后,您将得到一手排序好的手牌。
You may continue with your hand—for instance, 7, Q, J, 8, and 5. In the end, you will end up with a sorted hand.
每张新牌都会根据之前发出的牌的位置插入到正确的位置。这种排序方法被称为插入排序,它适用于任何类型的物体,而不仅仅是扑克牌。
Each new card was inserted in the right place in relation to the previous cards that had been dealt. This way of sorting is called insertion sort for that reason, and it works for any kind of objects, not just playing cards.
与选择排序类似,插入排序也易于实现。事实证明,它们的复杂度相同:。但它有一个独特的特点:就像我们的扑克牌示例一样,在对项目进行排序之前,您不需要事先知道它们。实际上,您在拿到它们时就对它们进行排序。这意味着,当要排序的项目以某种方式实时传输给您时,可以使用插入排序。我们在第 2 章讨论图表中的锦标赛问题时遇到了这种实时工作的算法,我们将其称为在线算法。如果我们必须对未知数量的项目进行排序,或者我们必须能够立即停止并在突然被要求这样做时提供排序列表,那么插入排序就是可行的方法。2
Like selection sort, insertion sort is straightforward to implement. It turns out that it has the same complexity: . It does have a distinct characteristic, though: as in our playing cards example, you don’t need to know the items in advance before you sort them. In effect, you sort them as you get them. That means that you can use insertion sort when the items to be sorted are somehow streamed to you live. We met this kind of algorithm, which works live as it were, when we discussed the tournament problem in graphs in chapter 2, and we called it an online algorithm. If we have to sort an unknown number of items, or if we must be able to stop immediately and provide a sorted list whenever we are suddenly called to do so, then insertion sort is the way to go.2
现在让我们回到霍勒瑞斯。他的制表机既没有使用选择排序,也没有使用插入排序。它们实际上使用了一种至今仍在使用的方法的前身,叫做“排序法”。基数排序。为了向第一个机器支持的排序应用程序致敬,值得花些时间了解一下基数排序的工作原理。它也很有趣,因为在这种排序方法中,待排序的项实际上并不会相互比较。至少,正如我们将看到的,并非完全如此。更重要的是,基数排序不仅具有历史意义,它的性能也非常出色。如此古老而实用的算法,还有什么理由不让人喜欢呢?3
Let us now return to Hollerith. His tabulating machines did not use selection sort, nor insertion sort. They actually used a precursor of a method still in use today, called radix sort. As a tribute to the first machine-enabled sorting application, it is worth spending some time on how radix sort works. It is also interesting because this is a sorting method in which the items to be sorted are not really compared to each other. At least not entirely, as we will see. What’s more, radix sort is not just of historical interest, as it performs fantastically. What’s not to like in a venerable yet practical algorithm?3
理解基数排序最简单的方法还是用扑克牌。假设我们有一副洗好的牌,想对它进行排序。一种方法是将其分成 13 堆,每个点值对应一堆。我们把这副牌取出来,每堆放进相应的牌堆。这样一来,我们就得到了 13 堆牌,每堆四张:一堆包含所有 A,一堆包含所有 2,等等。
The easiest way to see a radix sort is by using playing cards again. Suppose that we have a full deck of cards that has been shuffled and want to sort it. One way to do it is to form 13 piles, one for each rank value. We go through the deck, taking each card and placing it in the respective pile. We’ll get 13 piles of four cards each: a pile containing all the aces, another one containing all the twos, and so on.
然后我们把牌一叠一叠地收起来,小心地把抽到的每一叠牌放在我们收集的牌堆底部。这样,我们手中的所有牌就都整理好了,并且部分排序。前四张牌是A,接下来的四张是2,一直到K。
Then we collect the cards, pile by pile, taking care to put each pile we pick at the bottom of the cards we are collecting. In this way we’ll have all the cards in our hands, partially sorted. The first four cards will be aces, the next four twos, and all the way to the kings.
我们现在创建四个新的牌堆,每个花色一个。我们将检查这些牌,取出每张牌并将其放入对应的牌堆。我们将得到四堆不同花色的牌。由于牌值已经排序,所以每堆牌中都会有同一花色的所有牌,按顺序排列。
We now create four new piles, one for each suit. We’ll go through the cards, taking each card and putting it into the corresponding pile. We’ll get four piles of suits. Because the values were already sorted, in each pile we will have all cards of a single suit, in rank order.
为了完成卡片的分类,我们只需要将它们一堆一堆地收集起来。
To finish sorting our cards, we only need to collect them pile by pile.
这就是基数排序的精髓。我们并不是通过完全比较牌来对牌进行排序,而是进行部分比较,先按点数比较,然后按花色比较。
This is the essence of radix sort. We did not sort the cards by fully comparing cards between them. We performed partial comparisons, first by rank, and then by suit.
当然,如果基数排序只适用于纸牌,那么它就不值得我们关注了。我们可以看看基数排序是如何对整数进行操作的。假设我们有以下一组整数:
Of course, if radix sort was applicable only to cards, it would not merit our attention here. We can see how radix sort works with integer numbers. Suppose that we have the following group of integers:
我们确保所有整数的位数相同。因此,我们会根据需要在数字左侧填充零,例如将 5 变为 005,将 97 变为 097,将 53 变为 053。我们检查所有数字,并根据其最右边的数字进行分类。我们用这个数字将它们分成十堆:
We make sure that all the integers have the same number of digits. So we pad the numbers with zeros on the left if necessary, turning 5 to 005, 97 to 097, and 53 to 053. We go through all our numbers and triage them by their rightmost digit. We use that digit to place them in ten piles:
我们淡化了数字的填充颜色,以表明它们已部分排序;每堆数字都包含最右边数字相同的数字。第一堆数字的尾数均为零,第二堆数字的尾数均为一,直到最后一堆数字的尾数均为九。现在,我们收集这十堆数字,从左边第一堆开始,并在底部添加其他堆(注意不要以任何方式打乱数字顺序)。然后,我们根据右边第二位数字将它们重新分配到十堆中,得到:
We lightened up the numbers’ fill color to indicate that they are partially sorted; each pile contains the numbers with the same rightmost digit. All the numbers in the first pile end in zero, and in the second pile they end in one, up to the last pile, where they end in nine. We now collect the ten piles, starting from the first on the left and adding piles at the bottom (taking care not to shuffle the numbers in any way). Then we redistribute them into ten piles using the second digit from the right and get:
这次,第一堆所有数字的右起第二位都等于零;第二堆所有数字的右起第二位都等于一,其他堆也类似。同时,每堆中的物品都按其末位数字排序,因为我们第一次堆放它们时就是这么做的。
This time all the numbers in the first pile have their second from the right digit equal to zero; in the second pile they have their second from the right digit equal to one, and similarly for the other piles. At the same time, the items in each pile are sorted by their last digit because that’s what we did when we piled them the first time.
最后,我们收集这些堆并重新分配数字,这次使用右边第三位数字:
We finish by collecting the piles and redistributing the numbers, using the third digit from the right this time:
现在,每堆物品的起始数字相同,并根据第二位数字(这是上一次堆叠的结果)和最后一位数字(这是第一次堆叠的结果)进行排序。为了得到排序后的数字,我们只需最后一次收集这些物品。
Now the items in each pile start with the same digit and are sorted by their second digit, as a result of the previous piling, and their last digit, as a result of the first piling. To get our sorted numbers, we just collect the piles one final time.
基数排序可以处理单词或任何字母数字字符序列以及整数。在计算机科学中,我们将字母数字字符和符号序列称为字符串。基数排序适用于字符串;字符串可以像我们的示例中那样由数字组成,但它们可以是任何类型的字符串。字母字符串的堆数等于组成字母表的不同字符数(例如,英语有 26 个堆),但操作完全相同。基数排序的独特之处在于,即使字符串完全由数字组成,我们也将它们视为字母数字序列,而不是数字。如果您查看我们的工作方式,就会发现我们并不关心数字的值,而是每次都处理数字中的一位特定数字,就像我们从单词中提取字符一样,从右到左进行。这就是为什么基数排序有时被称为字符串排序方法。
Radix sort can work with words or any sequence of alphanumeric characters as well as integers. In computer science, we call a sequence of alphanumeric characters and symbols a string. Radix sort works with strings; the strings may be composed of digits, like in our example, but they may be any kind of strings. The number of piles for alphabetic strings will be equal to the number of distinct characters comprising the alphabet (for instance, 26 piles for English), but the operations will be exactly the same. What is distinctive in radix sort is that even when the strings are composed entirely of digits, we treat them as alphanumeric sequences, not as numbers. If you check how we worked, we did not care for the values of the numbers, but we were working each time with one particular digit from the number, in the same way that we would work by extracting characters from a word, going from the right to left. That is why radix sort is sometimes called a string sorting method.
不要被这个误导,以为基数排序可以对字符串进行排序,而我们这里介绍的其他排序方法却不行。其实它们都可以。只要组成字符串的符号本身可以排序,我们就可以对字符串进行排序。人类的名字对计算机来说是字符串,而我们可以对它们进行排序,因为字母按字母顺序排列,而名称可以按字典顺序进行比较。之所以称之为“字符串排序”,是因为基数排序将所有键(包括偶数)视为字符串。本章中的其他排序方法将数字视为数字,将字符串视为字符串,并根据情况比较数字或字符串。为了方便起见,我们在不同排序算法的示例中将数字用作键。
Do not let this fool you and lead you to think that radix sort can order strings while the other sorting methods we present here cannot. All of them can. We can sort strings, as long as the symbols that compose them can themselves be ordered. Human names are strings to computers, and we can sort them because letters are ordered alphabetically and names can be compared lexicographically. The appellation “string sorting” is because radix sort treats all keys, even numbers, as strings. The other sorting methods in this chapter treat numbers as numbers and strings as strings, and work by comparing numbers or strings, as is appropriate. It is only for convenience that we use numbers as keys in our examples in the different sorting algorithms.
基数排序通过逐位(或逐字符)处理待排序项,从而提高效率。如果我们有n 个待排序项,且每个项由w 个数字或字符组成,则该算法的复杂度为。这比选择排序和插入排序所需的复杂度要低得多。
The way radix sort works by processing the items to be sorted digit by digit (or character by character) makes it efficient. If we have n items to sort, and the items consist of w digits or characters, then the complexity of the algorithm is . That is much better than the complexity required by selection and insertion sorts.
绕了一圈,我们又回到了制表机。制表机的工作原理与此类似,它对穿孔卡片进行分类。假设我们有一副卡片,每张卡片有十列;每列上的穿孔代表一个数字。机器可以识别每列上的穿孔,从而算出相应的数字。操作员将卡片放入机器,机器根据卡片的最后一列(也就是最低有效数字)将卡片放入十个输出箱中。操作员从输出箱中收集卡片,小心不要将它们混在一起,然后将它们再次送入机器。这一次,机器根据卡片的倒数第二列(也就是最低有效数字)将卡片分配到输出箱中。最不重要的一张。重复这个过程十次后,操作员就能收集到一堆整齐的卡片了。瞧!
And so we come full circle to tabulating machines. A tabulating machine worked in a similar way, sorting punched cards. Imagine that we have a deck of cards where each card has ten columns; punched holes in each column indicate a digit. The machine could recognize the holes in each column, thus figuring out the corresponding digit. An operator put the cards in the machine, and the machine placed the cards in ten output bins depending on their last column—that is, the least significant digit. The operator collected the cards from the output bins, being careful not to mix them in any way, and fed them again into the machine, which this time distributed them into the output bins using their one but last column, the digit next to the least significant one. After repeating the process ten times, the operator could collect an ordered pile of cards. Voilà.
假设有一群孩子在院子里(比如在学校)闲逛,我们想让他们按从矮到高的顺序排队。首先,我们让他们排队,他们会按照自己想要的顺序排队:
Suppose we have a group of kids milling around in a yard (perhaps at school) and want to put them in line, from the shortest to tallest. Initially we ask them to get in line, which they will do, in whatever order they want:
现在我们随机挑选一个孩子:
Now we pick a kid at random:
我们告诉孩子们四处走动,这样所有比被选中的孩子矮的孩子都应该移动到他们和所有其他人都应该向右移动。下图展示了我们选中的孩子最终的位置,你可以检查一下,个子较高的孩子在右边,个子较矮的孩子在左边:
We tell the kids to move around so that all kids who are shorter than the chosen one should move to the left of them and all the rest should move to their right. In the following figure we show where the kid we picked ended up, and you can check that those kids who are taller are to the right and those who are shorter are to the left:
我们没有要求孩子们按正确的顺序排列。我们只要求他们相对于我们选中的孩子移动。所以他们分成了两组,分别位于选中孩子的左右两侧。这两组中的孩子并没有按照从矮到高的顺序排列。然而,我们确实知道,在我们试图排列的队伍中,肯定有一个孩子排在最后:就是我们选中的那个孩子。左边所有孩子都比较矮,右边所有孩子至少都一样高。我们把选中的孩子称为“枢轴”,因为其他孩子都围绕着他移动。
We did not ask the kids to put themselves in the right order. We only asked them to move relatively to the kid we chose. So they formed two groups, on the left and right of the chosen one. The kids in these groups are not in any shorter-to-taller sequence. We do know, however, that one kid is certainly in the final position in the line we are trying to form: the very kid we picked. All the kids on the left are shorter and all the kids on the right are at least as tall. We call the kid we picked pivot because the rest of the kids have moved around them.
为了方便观察,我们将按照惯例,将摆放在正确位置的孩子涂成白色。当我们选定一个孩子作为枢轴时,我们会将其涂成黑色;当我们将其余孩子移动到枢轴周围后,我们会用一顶小黑帽来指示枢轴的最终位置(白色是因为它处于正确的位置,黑色的帽子顶部表示它是枢轴)。
As a visual aid, we will follow the convention of painting white the kids who are put in the right position. When we select a kid as a pivot, we will paint them black; when we have moved the rest of the kids around the pivot, we will use a small black hat to indicate the final position of the pivot (it’s white because it’s in the right position, with a black top, to indicate that it was the pivot).
现在我们把注意力转移到左边或右边的两组中的一组——假设是左边。我们再次从该组中随机选取一个枢轴:
Now we shift our attention to one of the two groups, left or right—say the left. Again we pick a pivot in that group at random:
我们要求该组中的孩子做与之前相同的事情:如果个子较矮,就移到枢轴的左侧,否则就移到右侧。我们将再次分成两个新的、更小的组,如下所示。其中一个组只有一个人,所以那个孩子在这个小组中的位置是正确的。然后,我们让其余的孩子在第二个枢轴的右侧。第二个枢轴在正确的位置,所有个子较矮的孩子在左侧,其余的孩子在右侧。右侧的这个组延伸到第一个枢轴。然后,我们从该组中选出一个新的第三个枢轴。
We ask the kids in that group to do the same thing as before: move so that if they are shorter, they move to the left of the pivot and otherwise they should end to the right. We will have again two new, smaller groups, which you can see below. One of them is a group of one, so that kid is in their right place in that trivial group. Then we have the rest of the kids to the right of the second pivot. The second pivot is in the right place, with all the shorter kids to the left, and all the rest to the right. This group to the right extends to the first pivot. We then pick a new, third pivot from that group.
当我们告诉小组里的孩子像之前一样,根据他们相对于第三个支点的高度移动时,他们会分成两个小组。我们把注意力集中在左边的那个小组。我们像之前一样。我们选择一个支点,也就是第四个支点,然后让这三个孩子围绕它移动。
When we tell the kids in the group to move as before, related to how tall they are with respect to the third pivot, two smaller groups will be formed. We focus our attention to the one on the left. We do as before. We pick a pivot, our fourth, and we ask the kids in this group of three to move around it.
当他们这样做时,轴心最终会是三个孩子中的第一个,所以我们剩下两个孩子在轴心的右边。我们从中选一个作为轴心,另一个孩子如果需要的话会移动到他的右边。
When they do, the pivot ends up being the first of the three, so we have a remaining group of two kids on the pivot’s right. We pick one of the pair as a pivot, and the other kid will move, if needed, to their right.
事实证明,这孩子根本不需要搬家。目前,我们已经设法把大约一半的孩子安排好了;还有两组孩子是我们之前留下的。我们正在处理之前的枢轴点。我们回到左边这两组中的第一个,以便在那里选择一个枢轴点,并重复这个过程。
It turns out that this kid does not need to move at all. Right now, we have managed to put about half the kids in order; there are two groups that we had left when we were dealing with previous pivots. We go back to the first of these two groups from the left in order to pick a pivot there and repeat the process.
同样,不需要四处走动,所以我们去找最后一组未分类的孩子来选择一个支点。
Again, no movement around was necessary and so we go to the last group of unsorted kids to pick a pivot.
我们在枢轴的右侧得到一组一元组,在枢轴的左侧得到一组二元组。我们关注左侧的组,并从中选择一个作为我们的最后一个枢轴。
We get a group of one, on the pivot’s right, and a group of two, on the pivot’s left. We focus on the left group and select one of the two as our last pivot.
好了,所有孩子都按身高排序了。
We are done. All the kids are in order of height.
让我们回顾一下我们所做的工作。我们每次都把一个孩子放到正确的位置,从而成功地把孩子们排好了序。要做到这一点,我们只需要让其他孩子围着他们移动。当然,这总是有效的,不仅适用于孩子,也适用于任何我们想要排序的东西。如果我们有一组需要排序的数字,我们可以遵循类似的过程,随机挑选一个数字,然后移动其余的数字,使较小的数字排在我们选定的数字之前,其余的则排在它之后。我们将在形成的较小的组中重复该过程;最终,我们将使所有数字都按正确的顺序排列。这就是快速排序算法的基础过程。
Let’s take stock of what we did. We managed to put the kids in order by putting one kid in their right place each time. To do that, we only needed to ask the rest of the kids to move around them. This will always work, of course, not just with kids but also with anything that we may want to sort. If we have a group of numbers that we can sort, we can follow a similar process, picking up a number at random and moving around the rest of the numbers so that those that are smaller end up before our chosen number, and the rest end up after it. We’ll repeat the process in the smaller groups that are formed; in the end, we’ll have all the numbers in the right order. This is the process that underlies the quicksort algorithm.
快速排序基于这样的观察:如果我们设法将一个元素放置在相对于所有其他元素的正确位置,无论这个位置如何,然后对剩余元素重复此操作,最终所有元素都会处于正确的位置。如果我们回想一下我们在选择排序中所做的,我们同样也取出每个元素,并使其相对于所有元素正确定位。其余元素,但我们取的元素始终是剩余元素中的最小值。这是一个关键的区别:在快速排序中,我们不应该选择剩余元素中的最小值作为基准。让我们看看如果我们这样做会发生什么。
Quicksort is based on the observation that if we manage to position one element in the correct position with respect to all the rest, whatever that position might be, and then repeat this with the remaining elements, we’ll end up with all the elements in their correct positions. If we think back on what we did with selection sort, there we also took each element and positioned it correctly with respect to all the rest, but the element we took was always the minimum of the remaining ones. This is a crucial difference: in quicksort, we should not pick the minimum of the remaining elements as our pivot. Let’s see what happens if we do so.
如果我们从同一组孩子重新开始,我们会选择所有孩子中身高最矮的那个作为轴心。他会排到队伍的最前面,其余孩子则会排到轴心后面。
If we start again with the same group of kids, we’ll get the shortest of all kids as our pivot. That one will go to the beginning of the line, and all the rest will move behind the pivot.
然后,我们选出比第一个孩子高的孩子,让他排在第二位。其余孩子也都排在中锋后面。
Then we’ll get the kid who is immediately taller than the first one and put them second in line. All the rest of the kids will go, again, behind the pivot.
对第三个孩子做同样的事情,我们得到这样的结论:
Doing the same thing with the third kid gets us to this point:
但请注意,这看起来很像选择排序,因为我们用剩下的最矮的孩子从左到右填充这条线。
But notice how this looks eerily like a selection sort, as we are filling in the line from the left to the right with the shortest of the remaining kids.
我们还没说每次如何选择一个元素作为主元。现在我们明白了,我们不应该选择元素中最小的那个。首先,选择最小值需要付出努力;我们每次都应该去寻找最小值。其次,它的行为就像我们已经知道的算法一样,所以这样做没有多大意义。
We have not said how we choose an element as a pivot each time. We now see we should not choose the minimum of the elements. First, choosing the minimum requires effort; we should really go and find the minimum each time. Second, it behaves like an algorithm we already know and so there should not be much point in doing it.
事实上,快速排序比选择排序更好,因为“通常”(我们很快就会明白“通常”的含义)我们会选择一个能够以更公平的方式划分数据的元素作为主元。选择最小元素会导致最不公平的划分:主元左侧没有元素,其余元素则全部位于主元右侧。因此,每次我们只需要设法定位主元本身即可。
The truth is that quicksort is better than selection sort because “normally” (we’ll see what normally means shortly) we’ll pick as our pivot something that partitions our data in some more equitable way. Choosing the minimum element creates the most unequal partition: nothing on the left of the pivot, and all the rest to the right of the pivot. Each time, then, we just manage to position the pivot itself.
如果划分得更好,那么我们不仅能定位主元,还能将主元左侧的所有元素相对于主元右侧的元素放置在正确的位置。是的,它们还没有到达最终位置。但总体来说,它们的位置比之前更好了。所以我们有一个元素,也就是枢轴,处于最佳位置,其他元素的位置也比之前更好。
If the partition is better, then we do not just manage to position the pivot. We also manage to position all the elements to the left of the pivot in their correct positions with respect to the elements to the right of the pivot. Yes, they are not in their final positions yet. But overall, they are in better positions than before. So we have one element, the pivot, in the best position possible, and the other elements better positioned than before.
这对快速排序的性能有重要影响:它的预期复杂度为,远优于
。如果我们要对 100 万个项目进行排序,
则需要
,即 1 万亿,而
约为 2000 万。
This has an important effect on the performance of quicksort: its expected complexity is , which is way better than . If we want to sort 1 million items, works out to , a trillion, while is about 20 million.
一切都取决于选择合适的数据透视表。每次都寻找一个能够以最佳方式划分数据的透视表毫无意义;这需要反复搜索才能找到正确的透视表,这会增加流程的复杂性。因此,一个好的策略是听天由命。只需随机选择一个透视表,然后使用它来划分数据即可。
It all hinges on picking the proper pivot. Searching for a pivot that would partition our data in the best possible way each time does not make sense; it would require searching to find the right pivot, so this would add complexity to the process. A good strategy, then, is to leave it to chance. Just pick a pivot at random and use what you picked to partition the data.
要理解为什么这是一个好的策略,让我们先看看为什么它不是一个坏策略。如果它导致像我们刚才看到的那样的行为,即快速排序退化为选择排序,那么它就是一个坏策略。如果我们每次选择一个实际上没有划分元素的项作为枢轴,就会发生这种情况。如果我们每次都选择项中的最小值或最大值(情况完全相同),也会出现这种情况。所有这些情况发生的总体概率可以得出为
To see why this is a good strategy, let us see why it is not a bad one. It would be a bad one if it led to a behavior like the one we just saw, where quicksort degenerates to selection sort. This would happen if we pick each time as a pivot an item that does not really partition the elements. This can happen if we pick each time the minimum or maximum of the items (the situation is exactly the same). The overall probability of all this happening can be found to be
这种概率很难把握,因为它极低。具体来说,如果你拿一副牌52张扑克牌,随机洗牌,最终牌堆顺序正确的概率是
这大约相当于抛硬币连续226次正面朝上的概率。乘以后
,情况并没有太大改善。这个数字
大约等于
。从宇宙的角度来看,地球由大约个原子组成
。如果你和你的朋友分别从地球上挑选一个原子,你们挑选出相同原子的概率是
,实际上大于
——对一副牌进行病态快速排序的概率。4
A probability such as is hard to grasp because it is abysmally low. To put it into context, if you take a deck of 52 playing cards and shuffle it randomly, the probability that the deck will end up being in order is This is about the same as flipping a coin and coming out heads 226 times in a row. When you multiply by , things are not improved much. The number is approximately equal to . To put the matter in cosmic perspective, the earth is composed of about atoms. If you and a friend of yours were to pick independently an atom from the earth, the probability that you would pick the same atom would be , actually greater than —the probability of pathological quicksort on a deck of cards.4
这就解释了为什么我们“通常”会以更公平的方式选择枢轴,正如我们上面所说。除非运气特别差,否则我们不会每次都选择最糟糕的枢轴。实际上,概率对我们更有利:通过随机选择枢轴,我们预期复杂度为。理论上也有可能比这更糟,但这种可能性仅仅出于学术研究。快速排序的速度与我们预期的一样快,适用于所有实际用途。
That explains that “normally” we pick a pivot in a more equitable way, as we said above. Excepting a streak of bad luck of cosmic proportions, we do not expect to pick the worst pivot possible each time. The odds actually work better in our favor: it is by picking pivots at random that we expect to get a complexity of . It is theoretically possible to do worse than that, but the possibility is only of academic interest. Quicksort will be as fast as we expect it to be for all practical purposes.
快速排序由英国计算机科学家托尼·霍尔于 1959-1960 年间开发。5它可能是当今最流行的排序算法,因为如果正确实现,它的性能会优于所有其他算法。它也是我们所见过的第一个行为并非完全确定的算法。虽然它总是能正确排序,但我们无法保证它始终具有相同的运行时性能。但我们可以保证它极不可能表现出病态行为。这是一个重要的概念,它将我们引向了所谓的随机算法:那些在操作中使用偶然因素的算法。这与我们的直觉相反;我们期望算法是终极确定性野兽,沿着预定的路径盲目地遵循我们为它们设定的指令。然而,近年来随机算法蓬勃发展,因为事实证明,偶然性可以帮助我们解决那些仍然难以用更标准的方法解决的问题。6
Quicksort was developed by the British computer scientist Tony Hoare in 1959–1960.5 It is probably the most popular sorting algorithm today because when implemented correctly, it outruns all others. It is also the first algorithm that we see whose behavior is not entirely deterministic. Although it will always sort correctly, we cannot guarantee that it will always have the same runtime performance. We can guarantee that it is extremely unlikely that it will exhibit pathological behavior. This is an important concept, which brings us to the so-called randomized algorithms: those algorithms that use an element of chance in their operation. This runs contrary to our intuition; we expect algorithms to be the ultimate deterministic beasts, slavishly following the instructions we lay down for them on a preordained path. And yet randomized algorithms have blossomed in recent years, as it has turned out that chance can help us solve problems that remain intractable to more standard approaches.6
我们已经了解了基数排序,它本质上是按照分布对数据进行排序:在每一轮数据处理中,它将每个数据放入正确的堆中。现在我们将了解另一种排序方法,它通过合并数据而不是拆分数据来对数据进行排序。这种方法称为归并排序。
We’ve met radix sort, which essentially sorts items by distribution: in each round through the data, it places each item in a correct pile. Now we’ll meet another sorting method, which sorts item by merging stuff together instead of splitting them apart. The method is called merge sort.
归并排序首先承认排序能力有限;想象一下,如果我们把元素以任何随机的排列方式交给我们,我们无法对它们进行排序。我们只能做到以下几点:如果我们有两组项目,并且每个组已经排序,我们可以将它们合并在一起并得到一个排序的组。
Merge sort starts by admitting to a limited capability for sorting; imagine that we are unable to sort our items if they are given to us in any random arrangement. We are only able to do the following: if we are given two groups of items, and each group is already sorted, we can merge them together and get a single, sorted group.
近年来,随机算法蓬勃发展,因为事实证明,机会可以帮助我们解决那些用更标准的方法仍然难以解决的问题。
Randomized algorithms have blossomed in recent years, as it has turned out that chance can help us solve problems that remain intractable to more standard approaches.
例如,假设我们有以下两个组,每行一个(尽管在我们的示例中,这两个组具有相同数量的项目,但组的大小不需要相等):
For example, say we have the following two groups, one per row (although in our example the two groups have the same number of items, there is no need for the groups to be equal in size):
如你所见,这两个组都已经排序了。我们想将它们合并,创建一个单独的排序组。这很简单。我们检查两个组的第一个元素。我们发现 15 小于 21,所以这将是我们第三个组的第一个元素:
As you can see, each of the two groups is already sorted. We want to merge them in order to create a single sorted group. This is really simple. We check the first item of both groups. We see that 15 is smaller than 21, so this will be the first item of our third group:
我们再次检查这两个组的第一个元素,这次第二个组中的 21 小于第一个组中的 27。因此,我们将其取出并附加到第三个组中。
We examine again the first elements of the two groups, and this time 21 from the second group is smaller than 27 from the first group. So we take it and append it to the third group.
如果我们继续这样做,我们将从第一组中取出 27 个,然后从第二组中取出 35 个,将它们添加到第三组的末尾:
If we continue in this way, we’ll take 27 from the first group and then 35 from the second group, adding them to the end of the third group:
现在 51 小于 59,56 也小于 59。由于我们已经将 35 从第二组移到了第三组,最终我们将连续移动三个元素从第二组移到了第三组。这没问题,因为这样我们就能保持第三组中元素的排序。前两组没必要以相同的速率减少。
Now 51 is smaller than 59, and 56 is smaller than 59. As we already have moved 35 from the second group to the third, in the end we’ll have moved three items in a row from the second group to the third. That is fine because in this way we keep items in the third group sorted. There is no reason why the two first groups should diminish at the same rate.
我们回到第一组,因为 59 小于 69,所以我们将其添加到第三组:
We return to the first group, as 59 is smaller than 69, so we add it to the third group:
接下来,通过将 69 移到第三组,我们完全清空了第二组:
Next, by moving 69 to the third group we empty the second group completely:
最后,我们将第一组中剩余的元素移动到第三组中——它们肯定比第三组中的最后一个元素大,否则我们之前就不会把它移到那里了。现在我们的元素已经完全排序了:
We finish by moving the last remaining elements of the first group to the third group—they are definitely larger than the last element of the third group or otherwise we would not have moved it there previously. Our items are completely sorted now:
能从两个已排序的组生成一个已排序的组固然不错,但这似乎并不能解决我们对一组未排序元素进行排序的问题。确实如此,但它仍然是解决方案的重要组成部分。
It’s nice to have a way of producing a sorted group from two sorted groups, but this does not seem to solve our problem of sorting a single group of unsorted items. It is true it does not, yet it is an important component of the solution.
假设我们有一群人。我们把一组物品交给其中一人分类。这个人不知道如何分类,但他知道,如果他们能找到两份已排序的物品,就能得到最终排序好的一组。所以他们的做法是:把这组物品分成两份,然后交给另外两个人。他们对第一个人说:“拿着这组物品去分类。分类完成后,归还给我。” 他们对第二个人说了同样的话。然后他们等待。
Imagine now that we have a group of people. We give to one of them a group of items to sort. That person does not know how to sort them, but they do know that if somehow they had two sorted parts of the items, they could produce a final sorted group. So what they do is this: they split the group in two and pass it on to two other people. They say to the first of them, “Take this group and sort it. Once you are done, return it to me.” They say the same thing to the second person. Then they wait.
虽然我们的第一个联系人不知道如何对物品进行排序,但如果两个新联系人设法对自己的物品进行排序并归还,那么第一个联系人就会将最终排序好的物品归还给我们。但是,另外两个联系人的知识并不比我们的第一个联系人多——他们不知道如何排序,而只知道如何使用上述算法合并排序好的物品——那么,这真的取得了什么成果吗?
Although our first point of contact does not know how to sort the items, if the two new contacts manage somehow to sort their own parts and return them, then the first person would return to us the final, completely sorted group. But the two other contacts know no more than our initial contact—they don’t know how to sort but rather only how to merge sorted stuff using the algorithm above—so has anything really been achieved?
答案是肯定的,只要他们做同样的事情:他们将自己的部分分成两部分,每个人将自己的部分委托给另外两个人,等待他们执行命令并为他们提供两个排序好的部分。
The answer is yes, provided that they do the same: they split their part in two, and each delegates their part to two other persons, waiting for them to do their bidding and provide them with two sorted parts.
这看起来像是一个终极的推卸责任的游戏,但如果我们试着用一个例子来说明会发生什么。我们从数字 95、59、15、27、82、56、35、51、21 和 79 开始。我们把它们交给 Alice (A),她把它们分成两份,然后传给 Bob (B) 和 Carol (C)。你可以在下面的倒置树的第一层看到这一点:
This seems like the ultimate pass-the-buck game, but look at what happens if we try to see it unfold with an example. We start with the numbers 95, 59, 15, 27, 82, 56, 35, 51, 21, and 79. We give them to Alice (A), who splits them in two, and passes them to Bob (B) and Carol (C). You can see that in the first level of the upside-down tree below:
然后,鲍勃将他的号码一分为二,分别传给戴夫(D)和伊芙(E)。同样,卡罗尔也一分为二,分别传给弗兰克(F)和格蕾丝(G)。我们这群人继续互相推诿责任。戴夫将他的号码分给了海蒂(H)和伊万(I);伊芙将她的两个号码分给了朱迪(J)和凯伦(K);弗兰克和格蕾丝分别分给了利奥(L)和马洛里(M)以及尼克(N)和奥利维亚(O)。最后,海蒂将她的一对号码分给了佩吉(P)。和昆汀 (Q),而里奥则将他的搭档分给了罗伯特 (R) 和西比尔 (S)。
Then Bob splits his numbers into two, and passes them on to Dave (D) and Eve (E). Similarly, Carol splits her numbers, and passes them on to Frank (F) and Grace (G). Our cast of characters continue passing the buck. Dave divides his numbers to Heidi (H) and Ivan (I); Eve distributes her two numbers to Judy (J) and Karen (K); Frank and Grace split to Leo (L) and Mallory (M) and Nick (N) and Olivia (O), respectively. Finally, Heidi splits her pair to Peggy (P) and Quentin (Q), while Leo splits his pair to Robert (R) and Sybil (S).
树叶子旁的人其实没什么事可做。佩吉和昆汀每人收到一个数字,并被告知要对它进行排序。但根据定义,单个数字是有序的:它本身就有序。所以佩吉和昆汀只需把他们的数字还给海蒂即可。此外,伊万、朱迪、凯伦、罗伯特、西比尔、马洛里、尼克和奥利维亚也归还了他们收到的数字。
The people at the leaves of the tree have really nothing to do. Peggy and Quentin receive a number each, and they are told to sort it. But a single number is sorted by definition: it is in order with itself. So Peggy and Quentin just give their number back to Heidi. Also, Ivan, Judy, Karen, Robert, Sybil, Mallory, Nick, and Olivia return the numbers they received.
现在让我们来看下一页的树。在这棵树中,我们将从顶部的叶子(所以它看起来像一棵正常的树,而不是倒置的)移动到底部的根。让我们专注于海蒂。她得到了两个数字,每个数字都(显然)是排序的。海蒂知道如何合并两个已排序的组来得到一个组,这样她就可以用 95 和 59 得到 59, 95。然后,她将这个已排序的两组数字还给戴夫。利奥也会采取同样的行动:他会得到 35 和 56,这两个数字本身已经排序好了,并且知道如何将这两个数字按顺序排列,得到 35, 56,然后把这两个数字还给弗兰克。
Now let’s move to the tree on the next page. In this tree we’ll move from the leaves, at the top (so this looks like a normal tree, not upside down), to the root at the bottom. Let’s concentrate on Heidi. She gets back two numbers, each one of which is (trivially) sorted. Heidi knows how to merge two sorted groups to produce a single group so she can use 95 and 59 to make 59, 95. She then returns this sorted group of two to Dave. Leo will act the same: he will get 35 and 56, which are already sorted (by themselves), and knows how to put these two in order and create 35, 56, which he returns to Frank.
戴夫对最初收到的数字 95、59、15 一无所知,现在从海蒂那里得到了 59、95,从伊万那里得到了 15。这两组数字都已经排序,这意味着戴夫可以将它们合并,得到 15、59、95。同样,弗兰克从利奥那里得到了 35、56,从马洛里那里得到了 51,可以得到 35、51、56。
Dave, who was clueless about the numbers 95, 59, 15 that he had initially received, now gets 59, 95 from Heidi and 15 from Ivan. Both of these groups are already sorted, which means that Dave can merge them to create 15, 59, 95. In the same way, Frank gets 35, 56 from Leo and 51 from Malory, and can produce 35, 51, 56.
如果每个人都采取相同的方式,当数字到达 Alice 手中时,她会收到两个排序好的列表,一个来自 Carol,一个来自 Bob。她会将这两个列表合并起来,创建最终的排序列表。
If everybody acts in the same way, when the numbers reach Alice, she will get two sorted lists, one from Carol and one from Bob. She will merge them to create the final sorted list.
这两棵树是归并排序的精髓。我们尽可能地委托排序,直到无法进行任何排序,因为单个元素已经按照定义排序。然后,我们合并越来越大的组,直到将所有元素合并到一个最终排序好的组中。
These two trees are the essence behind merge sort. We delegate the sorting as much as we can, to the point that no sorting can take place because lone items are already sorted by definition. Then we merge larger and larger groups, until we absorb all elements in a single, final, sorted group.
我们对角色的智能要求极低。在第一棵树中,你可以看到Eve从Bob那里得到了一组数字,它们碰巧已经排序了:27,82。这无关紧要。她没有停下来检查它们是否需要排序——我们也不希望她这样做,因为这样的检查很耗时。她只是把它们拆分并传递下去。她会把它们取回来并合并。让他们来制作她已经得到的东西。没关系;从总体上看,Eve、Judy 和 Karen 之间这段毫无意义的三人舞不会影响算法的性能。
The smarts that we require from our characters is minimal. You can see in the first tree that Eve got from Bob a group of numbers that as it happened was already sorted: 27, 82. It does not matter. She does not stop to check whether they need sorting or not—and we don’t want her to because such a check would take time. She just splits and passes them down. She will get them back and merge them to produce what she already got. That’s all right; in the large scheme of things, this gratuitous pas de trois between Eve, Judy, and Karen won’t affect the performance of the algorithm.
归并排序的复杂度与快速排序相当。这意味着我们有两种性能相同的算法。在实践中,程序员可能会根据其他因素选择其中一种。通常,快速排序程序比归并排序程序运行得更快,因为它们在编程语言中的具体实现速度更快。归并排序先拆分数据,然后再合并它们,这意味着它们可以并行化,这样就可以利用计算机集群对大量数据进行排序,其中每台计算机都充当我们上面提到的人工排序员。
The complexity of merge sort is as good as that of quicksort, . That means that we have two algorithms with the same performance. In practice, programmers may choose one or the other depending on additional factors. Usually quicksort programs run faster than merge sort ones because their concrete implementation in a programming language is faster. Merge sort splits the data before merging them, which means that they can be parallelized, so that vast amounts of data can be sorted by a computer cluster, where each computer acts like our human sorters above.
归并排序与计算机一样古老。它的发明者是一位匈牙利裔美国人,诺伊曼·亚诺什·拉约什(Neumann János Lajos),他的美名约翰·冯·诺伊曼(John von Neumann,1903-1957)更为人熟知。1945年,他用墨水写了一份长达23页的手稿,内容是最早的数字计算机之一——电子离散变量自动计算机(EDVAC)。在第一页的顶部,他用铅笔写下了“绝密”(TOP SECRET)一词(后来被擦掉),因为计算机相关工作在1945年因与军方的联系而被列为机密。这篇论文的主题是计算机的一个非数值应用:排序。冯·诺伊曼描述的方法就是我们现在所说的归并排序。7
Merge sort is as old as computers. Its inventor was a Hungarian American, Neumann János Lajos, better known under his American name, John von Neumann (1903–1957). In 1945, he wrote a manuscript, in ink, 23 pages long, for one of the first digital computers, the Electronic Discrete Variable Automatic Computer, or EDVAC for short. At the top of the first page, the phrase “TOP SECRET” was penciled in (and later erased), as work on computers was classified in 1945 due to its connections with the military. The subject of the paper was a nonnumerical application of computers: sorting. The method that von Neumann described was what we now call merge sort.7
如果你未满一定年龄,HotBot、Lycos、Excite、AltaVista 和 Infoseek 这些词对你来说毫无意义,即使它们有意义,也可能指的不是搜索引擎。然而,它们都曾在某个时刻争夺我们的注意力,试图让我们把它们当作通往网络的门户。
If you are below a certain age, the words HotBot, Lycos, Excite, AltaVista, and Infoseek mean nothing to you, or if they do mean something, they probably do not mean search engines. Yet all of them were vying for our attention at some point or other, trying to get us to use them as the gateway to the web.
如今,这已成为历史,搜索引擎市场由两大巨头主导:Alphabet 旗下的谷歌和微软旗下的必应。众多竞争解决方案在新市场中爆发式增长,并随后整合,这种模式在历史上许多行业都曾出现过。搜索引擎领域最引人注目的是,我们知道,谷歌的惊人成功是这场变革的一个重要因素,而谷歌的成功又基于其创始人发明的算法。谷歌的创始人是拉里·佩奇和谢尔盖·布林,他们都是博士。斯坦福大学的候选人,他们将他们的算法命名为 PageRank,以 Page 命名(而不是像人们所预料的那样以“页面”和排名命名)。
This is history now, as the search engine landscape is dominated by two services, Google, run by Alphabet, and Bing, run by Microsoft. The explosion of many competing solutions in a new market, and their subsequent consolidation, is a pattern that we have witnessed in many industries in history. What is remarkable in the search engine space is that we know that a large factor in the evolution is the phenomenal success of Google, which in turn was based on an algorithm that its founders invented. The founders were Larry Page and Sergey Brin, doctoral candidates at Stanford University, and they named their algorithm PageRank, after Page (and not after “page” and rank, as one might expect).
在开始描述 PageRank 之前,我们需要了解搜索引擎的具体工作。这实际上包括两件事。首先,它们会抓取网页,读取并索引所有能找到的网页。这样,当我们输入搜索词时,搜索引擎会查看它们存储在已抓取网页上的数据,并找到与我们的查询匹配的网页。因此,如果我们搜索“气候变化”,搜索引擎就会搜索它们收集的数据,找到包含该搜索词的网页。
Before we embark on a description of PageRank, we need to understand what exactly search engines do. This is actually two things. First, they crawl the web, reading and indexing all the web pages they can come across. In this way, when we type in a search term, search engines look into the data they have stored on the crawled web pages and find the ones that match our query. So if we search for “climate change,” the search engines will search through the data they have amassed to find the web pages that contain this search term.
如果我们的搜索词描述的是一个热门话题,那么搜索结果可能会非常多。在撰写本文时,在谷歌上搜索“气候变化”会返回超过 7 亿条结果;当你阅读这些文字时,这个数字可能会有所不同,但你大概就能大致了解其规模了。这就引出了搜索引擎的第二项功能。它们必须将搜索结果呈现得更清晰,以便那些与我们搜索内容更相关的结果首先出现,而那些不太可能引起我们兴趣的结果则排在后面。如果你想了解有关气候变化的事实,你自然会希望看到来自联合国、美国国家航空航天局 (NASA) 或维基百科的搜索结果排在最前面。但如果最前面的搜索结果是一个解释地平说学会观点的网页,你可能会大吃一惊。主题。在数以亿计的网页中,可能与你的查询相关,很多内容可能无关紧要;有些可能空洞无物;还有一些则完全是胡言乱语。你应该专注于那些切中要点、权威性强的网页。
If our search term describes a popular topic, the results can be numerous. At the time of this writing, the query “climate change” on Google returns more than 700 million results; this number may be different when you read these lines, but you get an idea of the scale. This brings us to the second thing that search engines do. They must present the search results so that those that are more pertinent to what we are looking for appear first, and those that are less likely to interest us appear later. If you are trying to learn the facts about climate change, you would expect to see results from the United Nations, National Aeronautics and Space Administration (NASA), or Wikipedia come up on top. You would be rather surprised if the top result was a web page explaining the view of the Flat Earth Society on the topic. From the hundreds of millions of web pages that may be related to your query, many will be trivial; others may be bloviating, and yet others will be utter nonsense. You want to hone in on those that are to the point and authoritative.
当谷歌搜索引擎问世时(笔者年纪够大,应该还记得),人们(包括笔者本人)开始从其他老牌、现已停产的搜索引擎转向这个新来者,因为它的搜索结果更优质,而且速度更快。谷歌网页简洁明了,只包含相关信息,而不是像过去那样充斥着各种花哨的装饰,这也起了作用。我们先不谈第二个因素,尽管它很有启发性(谷歌明白用户关心的是优质快速的搜索结果,而不是花哨的装饰),而是讨论第一个因素。谷歌是如何快速地提供比其他搜索引擎更优质搜索结果的?
When the Google search engine arrived on the scene (the author is old enough to remember), people (the author included) started switching to the newcomer from other, older, now-extinct search engines because its results were better and they arrived faster. It also helped that the Google web page was plain, containing only relevant information, instead of being flush with all sorts of paraphernalia, which had been the fashion. We’ll leave aside the second factor, illuminating though it is (Google understood that users cared for good and fast search results, not for bells and whistles), and deal with the first. How could Google deliver better results than the others, fast?
如果网络规模较小,我们可以创建一个目录,并安排编辑人员整理目录,并为其条目(即网页)分配重要性。但网络的规模阻碍了这种方法,尽管在意识到网络规模使其无法完成之前,也曾有过类似的尝试。
If the web were small, we could create a catalog of it, and have editors to curate the catalog and assign an importance to its entries—the web pages. But the scale of the web precludes such an approach, although there were such attempts before it became obvious that the size of the web would make this an impossible task.
网络由网页组成,这些网页通过链接相互连接。我们称这些链接为超链接;包含指向文本其他部分的交叉引用的文本或其他文本被称为超文本。超文本的概念早于万维网。第一份通过互连文档来组织知识的系统描述是由美国工程师 Vannevar Bush 撰写的,于 1945 年出现在《大西洋月刊》上。万维网,或者简称为 Web,是由英国计算机科学家 Tim Berners-Lee 于 20 世纪 80 年代开发的。Berners-Lee 当时在瑞士日内瓦郊外的欧洲核子研究中心 CERN 工作,他希望创建一个系统来帮助科学家共享文档和信息。他们可以通过在线提供文档并将其文档中的链接添加到其他可在线获取的文档来做到这一点。通过人们添加新页面,网络已经并将继续有机地发展。网页作者编写页面内容并链接到与他们编写的页面内容相关的现有页面。
The web consists of web pages, linked to each other through links. We call these links hyperlinks; text that contains such cross-references to other parts of the text or other texts is called hypertext. The notion of hypertext predates the web. The first description of a system of organizing knowledge by interlinking documents was written by the US engineer Vannevar Bush and appeared in 1945 in the Atlantic. The World Wide Web, or simply the web as it became known, was developed by the British computer scientist Tim Berners-Lee in the 1980s. Berners-Lee was working at CERN, the European Organization for Nuclear Research, outside Geneva, Switzerland, and wanted to create a system to help scientists share documents and information. They could do that by making their documents available online and also adding links from their documents to other documents that were available online. The web has grown, and continues to grow, organically by people adding new pages. Authors of web pages write the content of the pages and link to existing pages that are relevant to the content of the pages they write.
如果您想了解有关气候变化的事实……如果最上面的结果是一个解释地平说学会对这个话题的看法的网页,您会感到相当惊讶。
If you are trying to learn the facts about climate change, . . . you would be rather surprised if the top result was a web page explaining the view of the Flat Earth Society on the topic.
假设您是一篇在线文章的作者,该文章概述了气候变化对您所在国家的影响。在文章中,当您介绍主题时,您可能希望读者导航到一个您认为是该领域权威来源的网页,因此您添加了该网页的链接。这样,您既可以帮助读者更深入地探究主题,又能提升文章的庄重感,因为您引用了另一个您信任的网页的论据来佐证自己的观点。
Imagine you are the author of an online article that provides an overview of the effects of climate change in your country. In the article, as you introduce the topic, you may want to let your readers navigate to a web page that you believe is an authoritative source on the matter, so you add a link to that web page. In this way you help your readers by allowing them to delve deeper into the subject, while at the same time you add gravitas to your own writing because you substantiate your statements by those of another web page that you trust.
像您一样,有很多人正在撰写关于气候变化对其国家或地区影响的在线文章。他们每个人可能都希望链接到他们认为该主题权威的来源。这些在线文章会生成超链接,指向相关信息来源。
There are many people like you, writing their own online articles on the effects of climate change in their countries or regions. Each one of them may also want to link to what they believe is an authoritative source on the topic. Hyperlinks will emanate from these online articles to point to relevant sources of information.
NASA之所以会在“气候变化”搜索中名列前茅,是因为许多作者(每人都撰写了自己的文章)决定在文章中添加指向NASA气候变化网页的超链接。虽然作者的选择各不相同,但很可能许多人都选择了同一个页面,例如NASA的页面。因此,相对于其他网页,这个关于气候变化的页面被评为重要页面也就不足为奇了。
The reason why NASA might come up on top in a search for climate change is that lots of authors, each one writing their own article, decided to place a hyperlink to the NASA web page on climate change. Authors made their own choices individually, but it is likely that many chose the same page, such as, for instance, NASA’s page. It therefore makes sense that this page on climate change should be judged important, relative to other web pages.
整个系统就像一种民主制度。网页作者将自己的页面链接到其他页面。一个网页获得的链接越多,越多的作者认为它足够重要,可以从自己的页面链接到它,因此它的整体重要性也就越高。
The whole system acts as a kind of democracy. Authors of web pages link their pages to other pages. The more links that a web page accrues, the more authors judged it important enough to link to it from their own page, and thus the more important it becomes overall.
然而,这与我们通常实践的民主存在概念上的差异。并非所有撰写的文章都同等重要。有些文章出现在更有影响力的网站上,有些则不然。一篇只有少数人阅读的博客文章,其影响力远不及一篇拥有数十万读者的在线出版物文章。这表明,我们不应该仅仅考虑指向某个网页的链接数量。以衡量其重要性。指向网页的人也很重要,而不仅仅是指向多少个。可以合理地预期,来自知名网页的链接比来自冷门网站的链接更有分量。虽然你不应该以貌取人,但著名作家的推荐比不知名书评人的好评更重要。从一个页面到另一个页面的每个链接都充当着从第一页到第二页的推荐,而推荐的分量取决于推荐者的状态。同时,如果一个页面链接到许多其他页面,那么它的推荐应该在接收它的页面之间进行分配。
There is, though, a conceptual difference from democracy as we usually practice it. Not all of these articles that are written are equal. Some of them appear on more prestigious web sites than others. An article on a blog that is read by a handful of people carries less weight than an article in an online publication that rakes in hundreds of thousands of readers. This indicates that we should not consider just the number of links pointing to a web page in order to gauge its importance. Who is pointing to a web page is also significant, not just how many. It is reasonable to expect that a link from a prestigious web page carries more weight than a link from an obscure site. Although you should not judge a book by its cover, an endorsement by a prominent author is more important than a good review by an unknown reviewer. Every link from one page to another page acts as an endorsement from the first page to the second, and the weight of the endorsement depends on the status of the endorser. At the same time, if a page links to many other pages, its endorsement should be divided, as it were, among the pages that receive it.
整个系统就像一种民主制度。网页作者将自己的页面链接到其他页面。一个网页获得的链接越多,……它的整体重要性就越高。
The whole system acts as a kind of democracy. Authors of web pages link their pages to other pages. The more links that a web page accrues, . . . the more important it becomes overall.
通过超链接链接起来的页面集合构成了一个庞大的图谱,包含数十亿个页面以及它们之间的更多链接。每个网页都是图谱中的一个节点。从一个页面到另一个页面的每个链接都是这个巨大图中的一条有向边。PageRank 背后的基本原理是,按照我们刚刚概述的推理,我们可以使用网络图谱的结构来确定每个网页的重要性。更准确地说,我们可以通过一个数字来获取每个网页的重要性。这个数字,我们称之为 PageRank,将衡量一个网页相对于其他网页的重要性。网页越重要,它的 PageRank 就越高。PageRank 算法遵循这一原理,在代表整个网络的庞大图谱上进行衍生。
The set of pages linked by hyperlinks forms an enormous graph, containing billions of pages and many more links between them. Every web page is a node in the graph. Every link from one page to another is a directed edge in this huge graph. The fundamental insight behind PageRank is that following the reasoning we have just outlined, we can use the structure of the web graph to give us the importance of each web page. To be more precise, we can get the importance of each page through a number. This number, which we will call its pagerank, will measure the significance of a web page related to the other web pages. The more important a web page is, the higher its pagerank will be. The PageRank algorithm follows the ramification of this insight on a humongous scale, on the graph representing the whole web.
当我们浏览某个网页时,该网页上的链接会指向与我们当前浏览的网页相关的其他网页。链接的存在本身就表明其末尾的网页很重要——否则网页的作者一开始就不会链接到它。请考虑下面的示例图,它表示一小部分相互链接的网页:
When we are on a web page, the links on that page point to other pages that are relevant to the page we are currently browsing. The very existence of the link indicates that the web page at the end of the link is important—otherwise the author of the web page would not link to it in the first place. Consider the example graph below, representing a small set of web pages that link to each other:
在这样的图中,我们将指向网页的链接称为反向链接;延伸开来,我们也将指向网页的页面称为反向链接。这样,网页 3 的反向链接就是指向它的边、它的入边,以及它们发出的节点:网页 2、4 和 5。在本章中,我们将关注由网页组成的图表,我们将互换使用术语“节点”和“页面”。
In such a graph, we call the links that point to a web page backlinks; by extension, we will also call the pages that point to a web page backlinks. In this way, the backlinks of web page 3 are the edges pointing to it, its incoming edges, as well as the nodes from which they emanate: web pages 2, 4, and 5. As in this chapter we will be concerned with graphs that are made up of web pages, we will be using the terms “node” and “page” interchangeably.
我们将基于两个基本原则构建一个用于查找每个网页重要性的算法:
We will build an algorithm for finding the importance of each web page based on two basic principles:
假设我们想计算第 3 页的重要性。我们发现它的反向链接分别为 2、4 和 5。我们依次计算每个反向链接,并假设我们知道它们各自的重要性。第 2 页将其重要性分配给第 3 页和第 5 页,因此第 3 页的重要性占其重要性的一半。第 4 页也将其重要性分配给第 3 页和第 1 页,因此第 3 页的重要性占其重要性的一半。最后,第 5 页将其重要性分配给第 2、3 和 4 页,因此第 3 页的重要性占其重要性的三分之一。为了节省输入,我们用第i页的重要性表示;r代表排名。那么第 3 页的重要性将是:
Say we want to find the importance of page 3. We saw that its backlinks are 2, 4, and 5. We take each one of them in turn and assume we know their own significance. Page 2 divides its importance over pages 3 and 5, and therefore will give half its importance to page 3. Page 4 also divides its importance over two pages, 3 and 1, and hence will give half its significance to page 3. Finally, page 5 divides its importance over pages 2, 3, and 4, and thus will give a third of its importance to page 3. To save typing, let us denote by the importance of page i; r will stand for rank. Then the importance of page 3 will be:
一般来说,如果我们想找出某个网页的重要性,并且我们知道每个反向链接的重要性,那么很容易找到我们想要的东西:将每个反向链接页面的重要性除以其链接到的网页数量,然后将结果添加到该页面的其他反向链接的贡献中。
In general, if we want to find out the importance of a certain web page and we know the importance of each backlink, it is easy to find what we are looking for: divide the importance of each backlink page by the number of web pages it links to and add the result to the contributions of the other backlinks of the page.
您可以将重要性的计算视为网页之间的投票竞赛。每个投票页面都具有一定的重要性,可以将其作为对其认为重要的网页的投票。如果它只认为一个网页重要,它就将票投给该网页。但是,如果它认为多个网页重要,那么它会将投票分成两部分,分别投给这些网页。因此,如果一个网页想将三个网页评为重要,它会将三分之一的选票投给每个网页。网页会将选票分配给哪些网页?分配给其超链接末尾的网页,也就是它链接到的网页。那么,网页的重要性是如何得出的呢?取决于其反向链接的重要性。
You may think of the calculation of importance as a voting contest between web pages. Each voting page has some significance, which it can use as a vote for those web pages that it deems important. If it considers only one web page as important, it just gives its vote to that web page. But if it considers more than one web page as significant, then it splits its vote and gives a part of the vote to each of these web pages. Therefore, if a web page wants to vote three web pages as being important, it will give to each one of them one-third of its vote. To which pages will a web page apportion its vote? To those at the end of its hyperlinks—that is, to those to which it links. And how is the importance of a web page derived? From the importance of its backlinks.
这两个原则确实赋予了网页排名某种民主的氛围。没有一个权威机构能够决定什么是最重要的。如果其他网页认为某个网页重要,并用其链接投票,那么该网页就很重要。然而,与大多数现实世界选举中适用的一人一票原则不同,并非所有网页都拥有平等的投票权。一个网页的投票页面是否可用取决于它的重要性——同样,这又是由其他网页决定的。
The two principles do endow some aura of democracy to the ranking of web pages. There is no single authority that decides what is most significant. A web page is important if other web pages think it is important, and they vote with their links. In contrast with the one person, one vote principle that holds in most real-world elections, however, not all web pages have equal votes here. The votes of a web page depend on how important it is—which, again, is determined by the other web pages.
这看起来像是诡辩,因为它实际上告诉我们,要计算一个网页的重要性,我们必须计算其反向链接的重要性。如果按照同样的推理,要计算每个反向链接的重要性,我们必须计算该反向链接的反向链接的重要性。然后,这个过程似乎越来越倒退,从一个反向链接到另一个反向链接,最后,我们不知道如何从头开始计算网页的重要性。更糟糕的是,我们可能会发现自己在兜圈子。在我们的例子中,要计算第 3 页的重要性,我们需要第 2、4 和 5 页的重要性。要计算第 2 页的重要性,我们需要第 1 页(以及第 5 页,但我们先不谈这一点)的重要性。要计算第 1 页的重要性,我们需要第 4 页的重要性,而要计算第 4 页的重要性,我们需要知道第 3 页的重要性。我们又回到了原点。
This may seem like casuistry because in effect it tells us that to find the importance of a web page, we must find the importance of its backlinks. If we follow the same reasoning, to find the importance of each of its backlinks, we must find the importance of that backlink’s backlinks. Then the process seems to regress more and more, from backlinks to backlinks, and in the end, we are left without knowing how to calculate the significance of the web page from where we started. Worse, we may find out that we run in circles. In our example, to calculate the importance of page 3, we need the importance of each of pages 2, 4, and 5. To calculate the importance of page 2, we need the importance of page 1 (and page 5, but let us leave that aside for a bit). To calculate the importance of page 1, we need the importance of page 4, and to find that, we need to know the importance of page 3. We are back where we started.
为了解决这个问题,我们假设在计算网页的重要性之前,我们赋予所有网页同等的重要性。用投票的比喻来说,我们赋予每个网页恰好一票。投票开始时,每个页面都会按照我们描述的方式投票,将其投票分散到其链接的页面。然后,每个页面都会收到来自其所有反向链接的投票。投票转移如下:
To see how we get out of the problem, let us assume that before we begin calculating the importance of the web pages, we give them all equal significance. In terms of our voting metaphor, we give each web page exactly one vote. When the voting starts, each one of the pages will vote in the way we described, spreading its vote to the pages to which it links. Each page will then receive votes from all its backlinks. The transfer of votes will look like this:
页面 1 将其投票发送给它唯一链接的页面 2。页面 2 将其投票分成两部分,分别发送给页面 3 和
页面 5。页面 3 将其投票分成三部分,
分别发送给页面 1、页面 4 和页面 5。页面 4 和页面 5 使用相同的方法进行投票。
Page 1 sends its vote to page 2, the only page it links to. Page 2 divides its vote into two parts, and sends to page 3 and to page 5. Page 3 divides its vote into three parts and sends to each of pages 1, 4, and 5. Pages 4 and 5 vote using the same method.
投票结束后,每个页面将根据其反向链接获得的投票总和(或投票分数)计算总分。例如,页面 1 获得了来自页面 3 和页面 4 的投票,因此将获得 1/2 + 1/3 的投票。= 5 / 6 票,而页面 3 已获得来自页面 2、4 和 5 的投票,因此将获得投票。我们看到,页面 1 的投票份额与开始时相比有所减少,而页面 3 的投票份额有所增加。
Once voting is over, each page will calculate the total from the sum of the votes, or fractions of the votes, it has received from its backlinks. For example, page 1, having received votes from pages 3 and 4, will have 1 / 2 + 1 / 3 = 5 / 6 votes, while page 3, having received votes from pages 2, 4, and 5, will have votes. We see that page 1 decreased its share of votes compared to where it started, while page 3 increased it.
现在让我们稍微改变一下设置。我们不再在投票开始前为每个页面投一票,而是为每个页面投一票,这样所有投票加起来就是一票。一般来说,如果我们有n 个页面,我们会给
每个页面投一票。其余过程完全相同。所有网页的总体重要性都等于一,并且重要性再次均匀分布在所有网页上。
Now let us change the setup a little bit. Instead of giving each page one vote before the voting starts, we give each page of a vote so that all votes sum up to one. In general, if we have n pages, we give votes to each one of them. The rest of the process is exactly the same. The overall importance of all web pages is equal to one, and the importance is again distributed evenly over all the web pages.
投票结束后,每个网页的重要性都会发生变化。如果我们进行计算,就会发现它们的重要性并非全部等于,而是依次为 0.17、0.27、0.27、0.13 和 0.17。网页 2 和 3 的重要性有所提升,而网页 1、4 和 5 的重要性则有所下降。所有网页的重要性总和为 1。
After the voting ends, the importance of each web page will have changed. Instead of having all of them equal to , if we do the calculations, we will find that they will be equal to 0.17, 0.27, 0.27, 0.13, and 0.17 for each of the pages in turn. Web pages 2 and 3 have gained in importance, while web pages 1, 4, and 5 have lost importance. The total significance of all web pages sums up to one.
现在,我们可以开始新一轮投票,规则完全相同。页面会将获得的投票分配给它们所链接的页面。第二轮投票结束时,每个页面将计算其投票数,以确定其累计重要性排名。计算后,新的重要性值将分别为 0.16、0.22、0.26、0.14 和 0.22。
We can now start another voting round, with exactly the same rules. The pages will spread the votes they have gathered to the pages to which they link. At the end of this second round, each page will count its votes to determine its standing in terms of accumulated importance. After the calculations, the new importance values will be 0.16, 0.22, 0.26, 0.14, and 0.22.
我们将再次执行完全相同的流程。实际上,我们会一遍又一遍地重复投票。如果这样做,投票结果(即分配给每个页面的重要性)将如下表所示变化,该表显示了每轮投票后的初始值和结果:
We’ll do exactly the same process again. In fact, we’ll repeat the voting again and again. If we do that, the votes—that is, the importance apportioned to each page—will evolve as in the following table, which shows the initial values and results after each voting round:
| 圆形的 | 第 1 页 | 第 2 页 | 第 3 页 | 第 4 页 | 第 5 页 |
|---|---|---|---|---|---|
|
开始 start |
0.20 0.20 |
0.20 0.20 |
0.20 0.20 |
0.20 0.20 |
0.20 0.20 |
|
1 1 |
0.17 0.17 |
0.27 0.27 |
0.27 0.27 |
0.13 0.13 |
0.17 0.17 |
|
2 2 |
0.16 0.16 |
0.22 0.22 |
0.26 0.26 |
0.14 0.14 |
0.22 0.22 |
|
3 3 |
0.16 0.16 |
0.23 0.23 |
0.26 0.26 |
0.16 0.16 |
0.20 0.20 |
|
4 4 |
0.17 0.17 |
0.22 0.22 |
0.26 0.26 |
0.15 0.15 |
0.20 0.20 |
|
5 5 |
0.16 0.16 |
0.23 0.23 |
0.25 0.25 |
0.15 0.15 |
0.20 0.20 |
|
6 6 |
0.16 0.16 |
0.23 0.23 |
0.26 0.26 |
0.15 0.15 |
0.20 0.20 |
如果我们继续进行第七轮投票,我们会发现情况与第六轮投票相比保持不变。投票数以及网页的重要性将保持不变。这就得到了最终结果。网页的排名是:第 3 页最重要,其次是第 2 页,然后是第 5 页,然后是第 1 页,最后是第 4 页。
If we go on to perform another, seventh voting round, we’ll discover that the situation will remain unchanged with respect to the sixth voting round. The votes, and therefore the importance of the web pages, will remain the same. This then gives us our final result. The ranking of the web pages is that page 3 is the most important, followed by page 2, then page 5, then page 1, and last comes page 4.
让我们回顾一下我们做了什么。我们首先从两个原则入手,它们为我们提供了计算网页重要性的规则,前提是我们知道每个反向链接的重要性。在开始之前,我们设置了所有n 个重要性相等的网页,每个网页的反向链接分享量都等于 。然后,我们通过计算每个网页的重要性值来计算其重要性。这将为每个网页赋予新的重要性值,这些
值与我们开始时的值不同。我们从这些值开始重复该过程。我们找到了另一组值。经过多次重复后,我们发现情况趋于稳定:重要性度量在每次重复之间不会发生变化。此时,我们停止并报告我们找到的值。
Let’s step back and reflect on what we did. We started with two principles that give us rules for calculating the importance of a web page, provided we know the importance of each of its backlinks. Before we start, we set up all n web pages with equal importance, equal to . Then we calculate the significance of each web page by summing the shares it gets from its backlinks. This gives us new values for the significance of each web page, different from the value from where we started. We repeat the process beginning with these values. We find another set of values. After a number of repetitions of this process, we found that the situation stabilized: the measure of importance would not change from one repetition to the next. At this point we called it a stop and reported the values that we found.
当然,问题在于我们刚才描述的方法是否适用于一般情况,而不是我们选择的特定示例。此外,它是否能产生合理的结果?
The question of course is whether the approach that we have just described works in general and not in the particular example that we chose. Moreover, does it produce sensible results?
通过页面反向链接的重要性来计算页面重要性的方法有一个简洁的公式。我们从描述网页之间链接的图表开始。我们可以用一个数字矩阵来表示一个图表,我们称之为邻接矩阵。构造方法很简单。我们创建一个矩阵,其行数和列数与图中的节点数相同。然后,我们为每个与链接对应的交点放置一个其余所有交点均为零。本例中的邻接矩阵为:
The method of calculating the importance of a page from the importance of its backlinks has an elegant formulation. We start from the graph that describes the links between our web pages. We can represent a graph by using a matrix of numbers, which we call its adjacency matrix. The construction is straightforward. We create a matrix with as many rows and columns as the nodes in the graph. Then we put one for each intersection that corresponds to a link and zero for all other intersections. The adjacency matrix for our example is:
我们还可以使用单行或向量来表示网页的重要性:
We can also represent the importance of the web pages using a single row or vector:
现在我们深入探讨 PageRank 算法的具体细节,开始使用“PageRank”一词来表示网页的重要性。您会发现,使用这个术语是合理的,因为我们能够根据重要性得出网络上所有网页的排名。由于我们的行包含所有 PageRank,我们将其称为图表的PageRank 向量。
As we now get into the nuts and bolts of the PageRank algorithm, we’ll start using the term pagerank to refer to the significance of a web page. You will see that the term will be justified as we will be able to derive a ranking, in terms of importance, of all the pages on the web. As our row contains all the pageranks, we will call it the pagerank vector of our graph.
每个网页的重要性被分摊到它所链接的页面上。现在我们有了邻接矩阵,我们可以这样实现:对每一行进行操作,用该行中每个“1”除以该行中“1”的数量。这相当于用每个页面的投票数除以指向该页面的出站链接数量。这样做,我们得到以下矩阵:
The importance of each web page is divided over the pages to which it links. Now that we have the adjacency matrix at hand, we can do that by going to each row and dividing each one in the row by the number of ones in that row. This is equivalent to dividing each page’s vote by the number of outgoing links to that page. If we do that, we get the following matrix:
我们将这个矩阵称为超链接矩阵。
We call this matrix the hyperlink matrix.
如果我们仔细观察超链接矩阵,会发现每一列都显示了页面的重要性是如何从链接到它的页面中推导出来的。以第一列为例,它与页面 1 的重要性相关。该页面的重要性源于页面 3 和页面 4。页面 3 将其重要性赋予了页面 1,因为它链接到了三个页面;页面 4 将
其重要性赋予了页面 1,因为它链接到了两个页面。图表中其他页面对页面 1 的重要性为零,因为它们没有链接到它。我们可以将其表示为:
If we look carefully at the hyperlink matrix, each column shows how the importance of a page is derived from the pages that link to it. Take the first column, which relates to the importance of page 1. This page takes its significance from pages 3 and 4. Page 3 gives of its importance to page 1 because it links to three pages, and page 4 gives of its importance to page 1 because it links to two pages. Page 1 receives zero significance from the other pages in the graph because they do not link to it. We can express this as:
但这正是 的定义,即第 1 页的 PageRank。我们通过将 PageRank 向量的元素与超链接矩阵第一列的相应元素的乘积相加来获得 PageRank。
But this is exactly the definition of , the pagerank of page 1. We got the pagerank by summing the products of the elements of the pagerank vector with the corresponding elements of the first column of the hyperlink matrix.
让我们看看如果我们取 pagerank 向量并将其元素的乘积与超链接矩阵第二列的相应元素相加会发生什么:
Let’s see what is happening if we take the pagerank vector and sum the products of its elements with the corresponding elements of the second column of the hyperlink matrix:
这正是 (第 2 页的 PageRank) 的定义。PageRank 向量元素与超链接矩阵第三列内容的乘积之和同样会给我们
(第 3 页的 PageRank):
That is exactly the definition of , the pagerank of page 2. The sum of the products of the elements of the pagerank vector with the contents of the third column of the hyperlink matrix will similarly give us , the pagerank of page 3:
你可以验证一下,使用超链接矩阵的第四列和第五列,我们分别会得到和
。这个运算——将 PageRank 向量元素与超链接矩阵每一列内容的乘积相加——实际上是将 PageRank 向量与超链接矩阵相乘。
You can verify that using the fourth and fifth columns of the hyperlink matrix we’ll get and , respectively. This operation—of summing the products of the elements of the pagerank vector with the contents of each column of the hyperlink matrix—is actually the product of the pagerank vector with the hyperlink matrix.
除非你熟悉矩阵运算,否则这可能会令人困惑,因为我们通常讨论的是两个数的乘积,也就是常见的乘法,而不是向量和矩阵等结构之间的乘积。我们可以对其他实体(不仅仅是数字)定义数学运算,只要它适合我们。向量与矩阵的乘积就是这样一种运算。它本身并没有什么神秘之处:它只是我们定义的一种涉及向量和矩阵元素的特定计算。
Unless you are familiar with matrix operations, this may be confusing because we usually talk about the product of two numbers, which is the common multiplication, and not about the product of constructs like vectors and matrices. We can define mathematical operations on other entities, not just numbers, as long as it suits us. The product of a vector with a matrix is such an operation. There is no mystery involved in it: it is simply an operation that we define as a particular calculation involving the elements of the vector and matrix.
假设我们制作百吉饼和羊角面包,售价分别为 2.00 美元和 1.50 美元。我们有两家商店;某一天,第一家商店售出 10 个百吉饼和 20 个羊角面包,而第二家商店售出 15 个百吉饼和 10 个羊角面包。如何计算每家商店的总销售额?
Suppose that we make bagels and croissants that we sell for $2.00 and $1.50, respectively. We have two shops; on a particular day, the first shop sells 10 bagels and 20 croissants, while the second shop sells 15 bagels and 10 croissants. How do we find the total sales per shop?
为了找到第一家商店的总销售额,我们将百吉饼的价格乘以该商店销售的百吉饼数量,将羊角面包的价格乘以该商店销售的羊角面包数量,然后将这两个价格相加:
To find the total sales from the first shop, we will multiply the price of a bagel with the number of bagels sold in that shop, and the price of a croissant with the number of croissants sold there, and we’ll add these two:
我们将采取同样的方式,找出第二家商店的总销售额:
We’ll do the same thing to find the total sales from the second shop:
为了更简洁地表达这一点,我们将百吉饼和羊角面包的价格写成一个向量:
To express this more succinctly, we write down the prices for the bagels and croissants as a vector:
我们还将每日销售额写成一个矩阵。该矩阵有两列,每家商店一列;还有两行,一行代表百吉饼,一行代表羊角面包:
We also write down the daily sales in a matrix. The matrix will have two columns, one per shop, and two rows, one for the bagels and one for the croissants:
然后,为了计算每家商店的总销售额,我们将向量的元素与销售额矩阵的每一列相乘,然后相加。这定义了向量与矩阵的乘积:
Then to find the total sales per shop, we multiply the elements of the vector with each column of the sales matrix and add them up. This defines the product of the vector with the matrix:
向量与矩阵的乘积是两个矩阵乘积的一个特例。让我们扩展一下这个例子,这样就不用再用一个向量来表示商品的价格了百吉饼和羊角面包,我们有一个矩阵,其中包含每次销售的价格和利润:
The product of a vector with a matrix is a special case of the product of two matrices. Let’s extend the example so that instead of having a vector with the prices of the bagels and croissants, we have a matrix with the prices and profits per sale:
为了计算每家商店的总销售额和每家商店的总利润,我们将创建一个矩阵,其中第i行第j列的元素是价格和利润矩阵第 i 行与销售额矩阵第 j行乘积之和。这两个矩阵乘积的定义如下:
To find the total sales per shop and total profit per shop, we will create a matrix in which the entries in the ith row and jth column will be the sum of products of the ith row of the prices and profits matrix with the jth row of the sales matrix. This is the definition of the product of the two matrices:
回到 PageRank 的问题,每一轮 PageRank 向量的计算实际上是将上一轮 PageRank 向量的值与超链接矩阵相乘。随着轮次的推进,我们会得到连续的 PageRank 估计值,也就是由这些值组成的 PageRank 向量的连续估计值。为了获得这些连续的 PageRank 向量估计值,我们只需要将每一轮的向量与超链接矩阵相乘,从而得到下一轮的向量。
Returning to pagerank, in each round the calculation of the pagerank vector is really the product of the value of the pagerank vector in the previous round with the hyperlink matrix. As we go through the rounds, we get successive estimates of the pageranks—that is, successive estimates of the pagerank vector that is made up of them. To get these successive estimates of the pagerank vector we only need to multiply the vector in each round with the hyperlink matrix, thereby getting the vector for the next round.
在第一轮中,我们从一个 PageRank 向量开始,其内容均为 ,其中n是页面数。如果我们将第一个 PageRank 向量表示为
,将第一轮结束时的 PageRank 向量表示为
,将超链接矩阵表示为H,则有:
In the first round, we start with a pagerank vector whose contents are all equal to where n is the number of pages. If we denote this first pagerank vector by , the pagerank vector at the end of the first round by , and the hyperlink matrix by H, we have:
在每一轮中,我们都使用该轮的 PageRank 向量来计算下一轮的 PageRank 向量。在第二轮投票中,我们得到了第三个 PageRank 估值,也就是第三个 PageRank 向量,我们进行了如下计算:
In each round we use the pagerank vector of that round to calculate the pagerank vector for the following round. In the second voting round, where we got our third pagerank estimates—that is, our third pagerank vector—we performed the calculation:
在第三轮投票中,我们得到了第四个 PageRank 向量:
In the third voting round, we got our fourth pagerank vector:
与每次迭代一样,我们将前一次迭代的结果乘以超链接矩阵,最后这是一系列PageRank向量的逐次估计值与超链接矩阵的乘积。如我们所见,这相当于将初始PageRank向量乘以超链接矩阵的幂。这种计算逐次近似值的方法称为幂法。因此,我们看到,计算一组网页的PageRank就是将幂法应用于PageRank向量和超链接矩阵,直到得到的PageRank向量不再变化,或者说,直到它收敛到一个稳定的值——也就是我们最终的PageRank指标。
As in every iteration, we multiply the result of the previous iteration by the hyperlink matrix, and in the end this is a series of products of the successive estimates of the pagerank vector by the hyperlink matrix. As we see, this is equivalent to multiplying the initial pagerank vector with increasing powers of the hyperlink matrix. This method of calculating successive approximations is called the power method. We see therefore that the calculation of the pageranks of a set of web pages is an application of the power method to the pagerank vector and hyperlink matrix, until the resulting pagerank vector does not change, or as we say, until it converges to a stable value—our final pagerank metrics.
我们刚刚对如何计算网络图的 PageRank 给出了更精确的描述:
We have just reached a more precise description of how to calculate the pageranks of a web graph:
这种表述不仅简洁,还能让我们将问题转移到线性代数领域,线性代数是数学中处理矩阵及其运算的分支。目前已有一套完善的理论体系我们可以用它来研究幂法以及矩阵运算的高性能实现,例如我们之前描述的乘法。该问题的矩阵形式也有助于研究幂法是否总是收敛,以便我们总能找到图的 PageRank 解。
Apart from being succinct, this formulation allows us to transfer the problem to the realm of linear algebra, the branch of mathematics that treats matrices and operations on them. There is a well-established body of theory that we can use to investigate the power method as well as performant implementations of matrix operations, such as the multiplication that we described. The matrix formulation of the problem will also help investigate whether the power method will always converge so that we can always come up with a solution to the pageranks of a graph.
现在我们来看一个更简单的图的例子,它仅由三个节点组成:
We now turn to an example of a simpler graph, consisting of just three nodes:
我们想求出这三个节点的 PageRank。我们遵循相同的算法。我们将 PageRank 向量初始化为,赋予所有节点相同的 PageRank。然后将 PageRank 向量乘以超链接矩阵,即:
We want to find the pageranks of these three nodes. We follow the same algorithm. We initialize the pagerank vector to , giving equal pageranks to all nodes. Then we multiply the pagerank vector with the hyperlink matrix, which is:
如果我们开始幂方法的迭代,将 PageRank 向量与超链接矩阵相乘以更新 PageRank 向量,然后一次又一次地进行,我们会发现经过四次迭代后,所有的 PageRank 都下降到了零:
If we start the iterations of the power method, multiplying the pagerank vector with the hyperlink matrix to update the pagerank vector, and then again and again, we’ll find out that after four iterations, all pageranks have gone down to zero:
| 圆形的 | 第 1 页 | 第 2 页 | 第 3 页 |
|---|---|---|---|
|
开始 start |
0.33 0.33 |
0.33 0.33 |
0.33 0.33 |
|
1 1 |
0.00 0.00 |
0.17 0.17 |
0.50 0.50 |
|
2 2 |
0.00 0.00 |
0.00 0.00 |
0.17 0.17 |
|
3 3 |
0.00 0.00 |
0.00 0.00 |
0.00 0.00 |
这显然是个问题。我们并不期望所有页面的重要性都为零。毕竟,页面 3 有两个反向链接,页面 2 只有一个反向链接,所以我们自然会期望这些反向链接显示在结果中,更不用说我们还希望所有页面的 PageRank 总和为 1 了。结果,所有页面的重要性都变得毫无意义。
That is clearly a problem. We do not expect all pages to have zero importance here. After all, page 3 has two backlinks and page 2 has one backlink, so somehow we would expect this to show on the results, let alone the fact that we also want the total sum of the pageranks to be one. Here nothing ended up being of any import at all.
问题的根源在于节点 3。虽然该节点有反向链接,因此会提升重要性,但它没有外链。因此,它在某种程度上吸收了图谱其余部分的重要性,但却没有将其重新分配到任何地方。它就像一个自私的节点或黑洞:进去的不会出去。经过几次迭代后,它就像一个汇聚点,所有 PageRank 值都流入其中,然后消失殆尽。
The cause of the problem is node 3. Although this node has backlinks and would thereby gain importance, it has no outgoing links. So in a way it sucks importance from the rest of the graph, but does not redistribute it anywhere. It acts as a selfish node or black hole: what goes in, does not go out. After a few iterations, it has acted as a sink where all pagerank values have gone in and vanished.
这类节点被称为悬垂节点,因为它们悬挂在图的(死)端。在网络上,没有什么可以阻止这类页面的存在。虽然网页通常同时包含传入和传出链接,但没有传出链接的页面也可能出现,并且会严重破坏我们之前描述的幂律方法。
Such nodes are called dangling nodes because they hang at the (dead) ends of the graph. On the web, nothing prohibits the existence of such pages. Although web pages usually have both incoming and outgoing links, a page with no outgoing links can appear and would wreak havoc with the power method as we have described it.
为了解决这个问题,我们用了一个比喻。我们想象一个人在网上冲浪,从一个页面跳转到另一个页面。要从一个页面转到另一个页面,冲浪者通常会点击链接。但是,冲浪者会遇到一个悬挂节点:一个没有链接到任何其他页面的页面。我们不希望冲浪者被困在那里,所以我们赋予冲浪者跳转到网络上任何页面的能力。这就像我们在网上冲浪,从一个页面跳转到另一个页面,直到到达死胡同。即使到达死胡同,我们也不会放弃。我们随时可以在网络浏览器中输入另一个地址,然后转到我们想要的任何其他网页,即使悬挂页面上没有指向该网页的链接。这就是我们希望冲浪者做的事情。当冲浪者不知道该去哪里时,他会从网络上选择一个页面,任意一个页面,然后继续冲浪。冲浪者成为一名随机冲浪者,配备传送装置,可以立即将冲浪者带到任何地方。
To overcome the problem, we work with a metaphor. We imagine that we have a human who surfs the web, jumping from page to page. To go from one page to another, the surfer normally follows a link. But then the surfer comes on a dangling node: a page with no links to any other page. We don’t want our surfer to remain trapped in there so we give the surfer the capability to jump to any other page, anywhere on the web. It is as if we are surfing the web from page to page until we reach a dead end. When we get there, we don’t give up and stop. We can always type another address in our web browser and move to any other web page we want, even if no links exist to it from the dangling page. This is what we want our surfer to do. When at a loss about where to go, the surfer will pick a page, any page, from the web and go there to continue surfing. The surfer becomes a random surfer, equipped with a teleportation device that can take the surfer instantly to any place at all.
把这个比喻带回到 PageRank 上,我们将超链接矩阵解释为,它给出了浏览者点击链接到达特定页面的概率。在我们的以三节点为例,超链接矩阵的第一行告诉我们,当浏览者在第1页时,选择第2页或第3页的概率相同。第二行告诉我们,当浏览者在第2页时,他总是会选择访问第3页。回到第一个例子,如果浏览者访问的是第5页,那么他有可能访问第2页、第3页或第4页,每种情况的概率均为。
To take this metaphor back to pagerank, we interpret the hyperlink matrix as giving us the probabilities that a surfer will follow a link to go to a particular page. In our three-nodes example, the first row of the hyperlink matrix tells us that when on page 1, the surfer will choose either page 2 or 3 with equal probability. The second row tells us that when on page 2, the surfer will always choose to visit page 3. Going back to our first example for a moment, if the surfer lands on page 5, then it is possible to go to page 2, 3, or 4 with a probability of for each of these outcomes.
悬挂节点表现为一行全为零。这样一来,浏览者就没有任何可能去任何地方。这时,随机浏览者就发挥作用了。正如我们所说,该浏览者会跳转到图中的任何页面。这意味着,实际上,我们改变了超链接矩阵,使其不再包含全为零的行。由于我们希望浏览者以相同的概率跳转到任何网页,因此我们将用(在我们的示例中为 )而不是零来填充该行
。我们的矩阵将变为:
A dangling node manifests itself in the presence of a row full of zeros. Then there is no probability that the surfer will go anywhere. This is where the random surfer kicks in. As we said, that surfer will jump to any page in the graph. That means that in effect, we change the hyperlink matrix so that it no longer has rows with zeros. As we want the surfer to jump to any web page with equal probability, instead of zeros we’ll fill the row with , or in our example, . Our matrix will become:
现在,到达第 3 页的浏览者可以以相同的概率访问图表中的任意页面。浏览者甚至可能暂时停留在同一页面上,但这无关紧要,因为浏览者可以反复尝试,最终会随机选择一个不同的目标页面。我们将这个修改后的超链接矩阵称为S矩阵,其中我们将零行改为值等于 的行。如果我们使用S矩阵运行幂函数,则 PageRank 的演变将是:
Now the surfer who lands on page 3 can go to any page in the graph with equal probability. The surfer may even stay temporarily on the same page, but that does not matter, as the surfer can try again and again, and at some point a different target page will be selected at random. We call this modified hyperlink matrix, where we change zero rows to rows with values equal to , the S matrix. If we run the power method using the S matrix, then the evolution of the pageranks will be:
| 圆形的 | 第 1 页 | 第 2 页 | 第 3 页 |
|---|---|---|---|
|
开始 start |
0.33 0.33 |
0.33 0.33 |
0.33 0.33 |
|
1 1 |
0.11 0.11 |
0.28 0.28 |
0.61 0.61 |
|
2 2 |
0.20 0.20 |
0.26 0.26 |
0.54 0.54 |
|
3 3 |
0.18 0.18 |
0.28 0.28 |
0.54 0.54 |
|
4 4 |
0.18 0.18 |
0.27 0.27 |
0.55 0.55 |
|
5 5 |
0.18 0.18 |
0.27 0.27 |
0.54 0.54 |
这次算法收敛到了非零值;没有发生重要性被吸走的情况。而且,结果也合情合理。PageRank最高的是页面3,它有两个反向链接;其次是页面2,有一个反向链接;然后是页面1,它完全没有反向链接。
This time the algorithm converges to nonzero values; no sucking out of importance occurs. Also, the results make sense. The highest pagerank is achieved by page 3, which has two backlinks; then comes page 2, with one backlink, and then page 1, which has no backlinks at all.
我们似乎已经解决了这个问题,但在更复杂的情况下,类似的问题仍然会出现。下图没有悬垂节点:
We seem to have solved the problem, but a similar issue raises its head in more complex situations. The following graph has no dangling nodes:
如果我们运行该算法,我们会发现两个节点(页面 1 和页面 4)最终的 PageRank 为零:
If we run the algorithm, we find that two nodes, pages 1 and 4, end up with zero pagerank:
| 圆形的 | 第 1 页 | 第 2 页 | 第 3 页 | 第 4 页 | 第 5 页 | 第 6 页 |
|---|---|---|---|---|---|---|
|
开始 start |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
|
1 1 |
0.08 0.08 |
0.22 0.22 |
0.14 0.14 |
0.00 0.00 |
0.42 0.42 |
0.14 0.14 |
|
2 2 |
0.00 0.00 |
0.25 0.25 |
0.25 0.25 |
0.00 0.00 |
0.29 0.29 |
0.21 0.21 |
|
3 3 |
0.00 0.00 |
0.22 0.22 |
0.22 0.22 |
0.00 0.00 |
0.33 0.33 |
0.22 0.22 |
实际情况是,即使没有悬挂节点,也有一组节点充当图其余部分的汇聚点。仔细观察该图,你会发现节点 2、3、5 和 6 作为一个组,只包含传入链接。虽然可以从节点 1 或 4 进入该组,但一旦进入,就只能在组内移动,无法离开。我们的随机浏览者将被困住,而不是被困在一个网页内。时间,但在一组仅在它们之间链接的页面内。
What happened is that even though there is no dangling node, there is a set of nodes that act as a sink for the rest of the graph. If you scrutinize the graph, you will see that the nodes 2, 3, 5, and 6, taken together as a group, have only incoming links. It is possible to go from node 1 or 4 to this group, but once we are in, we can only move inside the group. We are not able to go outside. Our random surfer will be trapped, not inside a single web page this time, but inside a group of pages that link only between themselves.
我们再次需要帮助随机浏览者摆脱这个陷阱。这次的解决方案需要对超链接矩阵进行更全面的修改。我们最初的超链接矩阵允许浏览者仅使用原始图中现有的链接从一个页面跳转到另一个页面。然后,我们修改了超链接矩阵,使其能够处理元素全为零的行,并提出了一个S矩阵,使浏览者能够摆脱悬空节点。这使得随机浏览者在处于悬空节点时能够跳转到图中的任意位置。现在,我们将通过修改S矩阵进一步改变随机浏览者的行为。
We need again to help the random surfer escape from this trap. This time the solution requires more comprehensive changes to the hyperlink matrix. Our initial hyperlink matrix allowed the surfer to go from page to page only using the existing links in the original graph. Then we modified the hyperlink matrix to handle rows with all zero elements and came up with the S matrix that allowed the surfer to get away from dangling nodes. This enabled the random surfer to jump to anywhere in the graph when in a dangling node. Now we will change the behavior of the random surfer a bit more by modifying the S matrix.
现在,当冲浪者落在一个节点上时,可能的移动方式就是S矩阵所指示的。在最后例如,S矩阵与超链接矩阵相同,因为不存在零行:
Right now, when a surfer lands on a node, the possible moves are those indicated by the S matrix. In the last example, the S matrix is the same as the hyperlink matrix because no zero rows exist:
如果随机冲浪者落在第 5 页,那么可能的移动方向是第 2、3 或 6 页,所有方向都有概率,正如S矩阵所示。我们将使随机冲浪者更加灵活,使其能够按照S矩阵移动,虽然不一定是,但概率是,我们将选择a;这样,随机冲浪者就会以 的概率
跳到图中的任意位置,不受S矩阵的约束。
If the random surfer lands on page 5, then the possible moves are to pages 2, 3, or 6, all with probability, as the S matrix indicates. We will make the random surfer more agile, with the power to move following the S matrix not always, but with some probability a that we will choose; then for some probability , the random surfer will jump anywhere in the graph, unconstrained by the S matrix.
能够从图中的任意位置跳转到任意位置意味着矩阵中不能有任何零——因为零元素表示无法进行移动。为了实现我们想要的效果,我们需要将一行中的零元素增加某个值,并减少非零元素,使得整行元素的总和始终为 1。矩阵的最终精确值可以通过线性代数计算得出,基于S和概率a。由此导出的新矩阵称为Google 矩阵,我们用符号G来表示。如果随机冲浪者的行为由 Google 矩阵决定,那么它就会按照我们的预期运行:冲浪者似乎以概率a遵循S矩阵,并以概率 独立移动。在我们的示例中,Google 矩阵为:
The ability to jump from anywhere to anywhere in the graph means that we cannot have any zeros at all in the matrix—because a zero entry denotes a move that cannot be made. To achieve what we want, we will need to increase the zero entries in a row by some value and decrease the nonzero entries so that the whole row always sums up to one. The exact final values of the matrix can be calculated through linear algebra, based on S and the probability a. The new matrix that will be derived is called the Google matrix, and we use the symbol G. If the behavior of the random surfer is determined by the Google matrix, it will work out as we want: the surfer will appear to be following the S matrix with probability a and move independently with probability . In our example, the Google matrix is:
将其与S矩阵进行比较。观察第一行,我们有两个元素为,其余元素为零。现在在 Google 矩阵中,我们将两个
元素转换为
,其余元素从 0 转换为
。其他行也发生了类似的变换。那么,如果随机浏览者浏览的是第 1 页,那么他可能转到第 2 页和第 5 页,概率为
对于其中任何一个,或者
对于其中每一个都有概率的任何其他页面。
Compare that to the S matrix. Observe that in the first row, we had two entries with and the rest were zero. Now in the Google matrix, we have the two entries turned to , and the rest of the entries turned from 0 to . Similar transformations have occurred in the other rows. If, then, the random surfer lands on page 1, the possible moves out are to pages 2 and 5 with probability for either of them, or any other page with probability for each one of them.
现在我们可以给出 PageRank 算法的最终定义:
We are now able to give the final definition of the PageRank algorithm:
我们简单地用“Google矩阵”替换了初始算法中的“超链接矩阵”。如果我们在图中追踪该算法,并将其与一组接收器节点关联起来,我们将得到:
We simply substituted “Google matrix” for “hyperlink matrix” of the initial algorithm. If we trace this algorithm in our graph with the group of sink nodes, we’ll get:
| 圆形的 | 第 1 页 | 第 2 页 | 第 3 页 | 第 4 页 | 第 5 页 | 第 6 页 |
|---|---|---|---|---|---|---|
|
开始 start |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
0.17 0.17 |
|
1 1 |
0.10 0.10 |
0.14 0.14 |
0.14 0.14 |
0.10 0.10 |
0.31 0.31 |
0.21 0.21 |
|
2 2 |
0.07 0.07 |
0.15 0.15 |
0.17 0.17 |
0.07 0.07 |
0.31 0.31 |
0.23 0.23 |
|
3 3 |
0.05 0.05 |
0.14 0.14 |
0.18 0.18 |
0.05 0.05 |
0.32 0.32 |
0.26 0.26 |
|
4 4 |
0.05 0.05 |
0.14 0.14 |
0.17 0.17 |
0.05 0.05 |
0.33 0.33 |
0.27 0.27 |
效果很好;我们不再得到零页面排名。
It works out fine; we get no zero pageranks anymore.
谷歌矩阵的幂法总是有效的。线性代数告诉我们,它总是会收敛到一组最终的 PageRank 值,这些值的和为 1,而不会出现悬挂节点或图的某些部分耗尽其余部分的 PageRank 值的情况。我们甚至不需要将 PageRank 值初始化为初始值的准确值。任何初始值都可以,只要它们和为 1。
The power method with the Google matrix will work always. Linear algebra tells us that it will always converge to a final set of pagerank values, the sum of which will be one, without suffering from dangling nodes or parts of the graph draining the pageranks of the rest of the graph. We don’t even need to initialize the pageranks to exactly when we start. Any initial set of values will do, as long as they sum up to one.
我们已经确定有一种方法可以在任何图表中找到页面排名,但问题仍然是结果最终是否合理。
Having established that we have a method to find the pageranks in any graph, the question remains whether the results are in the end sensible.
按照我们定义的方式,PageRank 向量相对于 Google 矩阵来说是一个特殊的向量。幂函数完成后,PageRank 向量不再变化。因此,如果我们将 Google 矩阵乘以 PageRank 向量,我们得到的就是相同的 PageRank 向量。在线性代数中,这个向量被称为Google 矩阵的第一个特征向量。无需深入数学,其底层理论支持这个向量对矩阵具有某种特殊意义的观点。
The pagerank vector, in the way that we have defined it, is a special vector in relation to the Google matrix. When the power method finishes, the pagerank vector does not change any more. Therefore if we multiply the Google matrix by the pagerank vector we will get simply the same pagerank vector. In linear algebra, this vector is called the first eigenvector of the Google matrix. Without going deep into the mathematics, the underlying theory supports the notion that this vector has some special significance to the matrix.
除了数学之外,PageRank 是否是衡量网页重要性的好方法的最终决定因素是其结果对我们人类的实用性。谷歌搜索引擎给出的结果很好,这意味着结果符合我们(搜索引擎用户)认为重要的内容。如果PageRank向量只是一种数学上的奇思妙想,与网页的重要性无关,我们今天就不会关注它了。
Beyond mathematics, the final arbiter of whether PageRank is a good way to assign importance to web pages is the utility of its results to us humans. The Google search engine gives good results, meaning that the results are in accordance with what we, the users of the search engine, regard as being important. If the pagerank vector was a mathematical curiosity that bore no relation to the significance of web pages, we would not be concerned with it today.
PageRank 的另一个优势是它可以高效地实现。Google 矩阵非常庞大;我们需要为网络上的每个页面分配一行和一列。然而,正如我们所见,Google 矩阵是从S矩阵派生而来的,而 S 矩阵又从超链接矩阵派生而来。我们实际上并不需要创建和存储 Google 矩阵本身;我们可以通过对超链接矩阵进行矩阵运算来动态创建它。这很方便。与 Google 矩阵中没有零元素相比,超链接矩阵中有很多零元素。网络可能有数十亿个页面,但每个页面都只链接到少数其他网页。超链接矩阵就是我们所说的稀疏矩阵:它大部分都是零,只有一些非零元素,这些非零元素的数量级比零元素要小。因此,我们可以使用巧妙的技巧来存储矩阵,这样就不需要占用大量内存来填充大部分零元素和少量非零元素,而是只存储非零元素出现的位置。与其存储在整个超链接矩阵中,我们只需要存储非零元素的坐标,这只需要很小一部分存储空间。这在 PageRank 算法的实际实现中为我们提供了巨大的优势。
An additional advantage of PageRank is that it can be implemented efficiently. The Google matrix is huge; we need one row and one column for every single page on the web. Yet the Google matrix is derived, as we saw, from the S matrix, which in turn is derived from the hyperlink matrix. We do not really need to create and store the Google matrix itself; we can create it dynamically with matrix operations on the hyperlink matrix. This is convenient. In contrast to the Google matrix, which has no zeros anywhere, the hyperlink matrix has lots and lots of zeros. The web may have billions of pages, but every single page links to only a small number of other web pages. The hyperlink matrix is what we call a sparse matrix: one that is mostly full of zeros, with only some nonzero entries, which are scales of magnitude fewer than the zero entries. Thus we can store the matrix using clever techniques that instead of requiring a big slab of memory to fill with mostly zeros and a few nonzeros, store only the positions where the nonzeros occur. Rather than storing the whole hyperlink matrix, we need only store the coordinates of the nonzero entries, which will require only a fraction of the storage space. This gives us big leverage in the practical implementations of the PageRank algorithm.
最后,需要注意的是:虽然我们知道 PageRank 在 Google 的成功中发挥了至关重要的作用,但我们并不清楚 PageRank 在 Google 中是如何应用的,甚至是否应用。Google 搜索引擎多年来一直在不断发展,但这些变化并未公开。我们知道 Google 会利用我们过去的搜索记录来优化搜索结果。它可以根据我们所在的国家/地区调整搜索结果,还可以参考全球其他用户搜索的整体趋势。所有这些都是 Google 用来改进产品并在搜索引擎领域保持领先地位的秘诀。然而,这并不会降低算法在解决网页排名问题(以图中的节点表示)时的效率。1
Finally, an important caveat. Although we know that PageRank played a crucial role in the success of Google, we do not know how, or even if, PageRank is used in Google today. The Google search engine has been evolving during the years, and the changes are not made public. We know that Google uses our past searches to fine-tune the results that it presents to our queries. It can tune the results depending on the country that we live in. It can also take into account the overall trends in the queries that other people make all around the world. All these are part of the secret sauce that Google uses to improve its product and retain its position in the search engine business against competitors. This, however, does not detract from the algorithm’s efficiency in solving the problem of ranking web pages, represented as nodes in a graph.1
PageRank 凸显了算法的另一个方面。算法的成功不仅仅取决于其优雅和效率,还取决于算法如何映射到问题上。这是一种创造性的行为。要解决网络搜索问题,必须克服网络规模庞大这一难题。但一旦你如果将网络视为一个图,它的大小就变成了优势,而不是障碍。正是因为有如此多的页面相互链接,你才可能期望基于图的链接结构的方法最终能够奏效。找到问题建模的方法是找到用算法解决问题的第一步。
PageRank highlights an additional aspect of algorithms. The success of an algorithm does not hinge only on its elegance and efficiency. It also has to do with the mapping of the algorithm to a problem. This is a creative act. To solve the problem of web search, one has to overcome the issue of the sheer size of the web. But once you conceive of the web as a graph, its size turns into an advantage, not a hindrance. It is exactly because there are so many pages, hyperlinked to each other, that you may expect that a method that is based on the link structure of the graph will in the end work. Finding the way to model a problem is the first step in finding the way to solve it with an algorithm.
近年来,深度学习系统异军突起,频频登上主流媒体头条。我们亲眼目睹计算机系统完成人类才能完成的壮举。更引人入胜的是,这些系统常常被描述为与人类思维运作方式存在相似之处——这自然让人不禁想到,人工智能的关键或许在于模仿人类智能的运作方式。
Deep learning systems have burst onto the scene in recent years, often making headlines in mainstream media. There we see computer systems performing feats that were the purview of humans. Even more tantalizing is the fact that these systems are frequently presented as having some similarities to the way the human mind works—which of course cues to the idea that perhaps the key for artificial intelligence may be to mimic the workings of human intelligence.
抛开炒作,大多数致力于深度学习的科学家并不认同深度学习系统像人类思维那样运作的观点。深度学习的目标是展现一些有用的行为,我们通常将这些行为与智能联系起来。然而,我们并非刻意模仿自然;事实上,人类大脑的结构太过复杂在计算机上模拟起来很复杂。但我们确实从大自然中汲取了一些养分,对其进行了极大的简化,并尝试设计出能够在某些领域完成通常由数百万年进化而来的生物系统完成的任务的系统。此外,正如本书所关注的,深度学习系统可以通过其所采用的算法来理解。这将阐明它们究竟在做什么以及如何做。这应该有助于我们看到,在它们取得的成就背后,其核心思想并不复杂。但这不应贬低该领域的成就。我们将会看到,深度学习需要人类巨大的智慧才能实现。
Brushing aside the hype, most scientists working on deep learning do not ascribe to the view that deep learning systems work like the human mind. The goal is to exhibit some useful behavior, which we often associate with intelligence. We do not go about copying nature, however; in fact, the architecture of the human brain is much too complicated to emulate on a computer. But we do take some leaves out of nature’s book, simplify them a lot, and try to engineer systems that could, in certain fields, do things usually done by biological systems that have evolved over millions of years. Moreover, and this concerns us here in this book, deep learning systems can be understood in terms of the algorithms they employ. This will shed some light on what they do exactly, and how. And it should help us see that underneath their accomplishments, the main ideas are not complicated. That should not belittle the achievements of the field. We’ll see that deep learning requires an enormous amount of human ingenuity in order to come to fruition.
要理解深度学习,我们需要从小处着手,从最基本的开始。在此基础上,我们将构建一个越来越精细的图景,直到本章结束时,我们能够理解深度学习中“深度”的含义。
To understand what deep learning is about, we need to start small, from humble beginnings. On these we will build a more and more elaborate picture, until, at the end of the chapter, we will be able to make sense of what the “deep” in deep learning stands for.
我们的出发点将是深度学习系统的主要构建模块,它确实源自生物学。大脑是神经系统的一部分,而神经系统的主要组成部分是被称为神经元的细胞。神经元具有特殊的形状;它们看起来与球状的我们通常与细胞联系在一起的结构。下面是最早的神经元图像之一,由现代神经科学的奠基人、西班牙人圣地亚哥·拉蒙·卡哈尔于1899年绘制。1
Our starting point will be the main building block of deep learning systems, which does come from biology. The brain is part of the nervous system, and the main components of the nervous system are cells called neurons. Neurons have a particular shape; they look different from the globular structures that we usually associate with cells. You can see below one of the first images of neurons, drawn in 1899 by the Spanish Santiago Ramón y Cajal, a founder of modern neuroscience.1
图像中间突出的两个结构是鸽子大脑的两个神经元。如你所见,神经元由细胞体和从中伸出的细丝组成。这些细丝通过突触将神经元连接到其他神经元,从而将神经元嵌入网络中。神经元是不对称的。每个神经元的一侧有许多细丝,另一侧只有一根细丝。我们可以将一侧的许多细丝视为神经元的输入,将另一侧的长向外细丝视为神经元的输出。神经元从其传入突触中获取电信号形式的输入,并可能向其他神经元发送信号。它接收的输入越多,输出信号的可能性就越大。我们说神经元随后被激发或激活。
The two structures that stand out in the middle of the image are two neurons of the pigeon brain. As you can see, a neuron consists of a cell body and the filaments that extrude from it. These filaments connect a neuron to other neurons through synapses, embedding the neurons in a network. The neurons are asymmetrical. There are many filaments on the one side and one filament on the other side of each neuron. We can think of the many filaments on the one side as the neuron’s inputs, and the long outgoing filament on the other side as the neuron’s output. The neuron takes input in the form of electric signals from its incoming synapses and may send a signal to other neurons. The more inputs it receives, the more likely it is to output a signal. We say that the neuron then fires or is activated.
人脑是一个庞大的神经元网络,数量约为一千亿,每个神经元平均与数千个其他神经元相连。我们目前还没有办法构建类似的东西,但我们可以用简化、理想化的神经元模型来构建系统。这是一个人工神经元的模型:
The human brain is a vast network of neurons, which number about one hundred billion, and each one of them is connected on average to thousands of other neurons. We do not have the means to build anything like that, but we can build systems out of simplified, idealized models of neurons. This is a model of an artificial neuron:
那是生物神经元的抽象版本,仅仅是具有多个输入和一个输出的结构。生物神经元的输出取决于其输入;同样,我们希望人工神经元根据其输入进行激活。我们不在大脑生物化学的领域,而是在计算的世界中,因此我们需要为我们的人工神经元建立一个计算模型。我们假设神经元接收和发送的信号是数字。然后,人工神经元获取所有输入,基于它们计算一些算术值,并在其输出上产生一些结果。我们不需要任何特殊电路来实现人工神经元。你可以将它想象成计算机内的一个小程序,它获取输入并产生输出,就像任何其他计算机程序一样。我们不需要真正构建人工神经网络;我们可以而且确实在模拟它们。
That is an abstract version of a biological neuron, being just a structure with a number of inputs and one output. The output of a biological neuron depends on its input; similarly, we want the artificial neuron to be activated depending on its input. We are not in the realm of brain biochemistry, but in the world of computing, so we need a computational model for our artificial neuron. We assume that the signals received and sent by neurons are numbers. Then the artificial neuron takes all its inputs, calculates some arithmetic value based on them, and produces some result on its output. We do not need any special circuit for implementing an artificial neuron. You can think of it as a small program inside a computer that takes its inputs and produces an output, much like any other computer program. We do not need to build artificial neural networks literally; we can and do simulate them.
生物神经网络学习过程的一部分是神经元之间突触的增强或减弱。获得新的认知能力和吸收知识会导致一些神经元之间的突触增强,而另一些突触则减弱甚至完全消失。此外,突触不仅可以激发神经元的放电,还可以抑制其激活;当信号到达该突触时,该神经元不应该放电。婴儿实际上拥有更多的突触。他们的大脑比成年人的大脑更发达。成长的一部分就是修剪我们大脑中的神经网络。或许我们可以把婴儿的大脑想象成一块大理石;随着我们人生岁月的流逝,这块大理石会被我们的经历和所学的东西凿刻,最终形成一个完整的形状。
Part of the learning process in biological neural networks is the strengthening or weakening of the synapses between neurons. The acquisition of new cognitive abilities and absorption of knowledge result in some synapses between neurons getting stronger, while others get weaker or even drop off completely. Moreover, synapses may not only excite a neuron to fire but also inhibit its activation; when a signal arrives on that synapse, the neuron should not fire. Babies have actually more synapses in their brains than adults. Part of growing up is pruning the neural network inside our heads. Perhaps we could think of the infant brain as a block of marble; as we go through the years in our lives, the block is chipped through our experiences and the things we learn, and a form emerges.
在人工神经元中,我们通过对输入应用权重来近似突触的可塑性及其兴奋或抑制作用。在我们的模型人工神经元中,我们有n 个输入,,,
...,
。我们对每个输入应用一个权重
,,,
...,
。每个权重乘以相应的输入。神经元收到的最终输入是乘积之和: 。我们
为这个加权输入添加一个偏差 b,你可以将其视为神经元激发的倾向;偏差越高,神经元被激活的可能性就越大,而添加到加权输入的负偏差实际上会抑制神经元激发。
In an artificial neuron, we approximate the plasticity of synapses, their excitatory or inhibitory role, through weights we apply to the inputs. In our model artificial neuron, we have n inputs, , , . . . , . To each one of them we apply a weight, , , . . . , . Each weight is multiplied by the corresponding input. That final input received by a neuron is the sum of the products: . To this weighted input we add a bias b, which you can think of as the propensity the neuron has to fire; the higher the bias, the more likely it is to be activated, while a negative bias added to the weighted input will actually inhibit the neuron from firing.
权重和偏差是神经元的参数,因为它们会影响神经元的行为。正如生物神经元的输出取决于其输入一样,人工神经元的输出也取决于其获得的输入。这通过将输入馈入一个特殊的激活函数来实现,该函数的结果就是神经元的输出。下图展示了使用激活函数的示意图:
The weights and bias are the parameters of the neuron because they influence its behavior. As the output of a biological neuron depends on its inputs, so the output of an artificial neuron depends on the input it gets. This happens by feeding the input into a special activation function, the result of which is the neuron’s output. This is what happens, diagrammatically, using as a stand-in for the activation function:
最简单的激活函数是阶跃函数,它给出的结果为 0 或 1。如果激活函数的输入大于 0,则神经元触发并输出 1,否则保持沉默并输出 0:
The simplest activation function is a step function, giving us a result of 0 or 1. The neuron fires and outputs 1 if the input to the activation function is greater than 0, or stays silent outputting 0 otherwise:
与其考虑偏差,不如考虑阈值。如果加权输入超过阈值,否则输出 0。事实上,如果我们将神经元的行为写成公式,第一个条件是或
。利用
,我们得到
,其中t,即偏差的反函数,是加权输入需要通过的阈值,才能使神经元触发。
Instead of a bias, it is helpful to think of a threshold. The neuron outputs 1 if the weighted input exceeds a threshold or outputs 0 otherwise. Indeed, if we write the behavior of the neuron as a formula, the first condition is or . By using , we get , where t, the opposite of the bias, is the threshold that the weighted input needs to pass for the neuron to fire.
在实践中,我们倾向于使用其他相关的激活函数来代替阶跃函数。下一页你将看到三种常见的激活函数。
In practice we tend to use other, related activation functions instead of the step function. On the next page you can see three common ones.
顶部的函数称为sigmoid ,因为它具有 S 形。2其输出范围从 0 到 1。较大的正输入会导致输出接近于 1;较大的负输入会导致输出接近于 0。这近似于生物神经元,它在输入较大时激发,否则保持沉默,并且是阶跃函数的平滑近似值。中间的激活函数称为tanh ,是双曲正切的缩写(有多种发音方式:“tan-H”、“then”或带有软 th 的“thents”,如感谢)。3它看起来像 sigmoid 函数,但不同之处在于它的输出范围从到
;较大的负输入会导致负输出,模仿抑制信号。底部的函数称为整流器;它将所有负输入变为 0,否则其输出与输入成正比。下表显示了三个激活函数对不同输入的输出。
The one on the top is called sigmoid because it has an S shape.2 Its output ranges from 0 to 1. A large positive input results in outputs close to 1; a large negative input results in an output close to 0. This approximates a biological neuron that fires on large inputs and stays silent otherwise, and is a smooth approximation to the step function. The activation function in the middle is called tanh, short for hyperbolic tangent (there are various ways to pronounce it: “tan-H,” “then,” or “thents” with a soft th, as in thanks).3 It looks like the sigmoid function, but it differs in that its output ranges from to ; a large negative input results in a negative output, mimicking an inhibitory signal. The function at the bottom is called a rectifier; it turns all negative inputs to 0, otherwise its output is directly proportional to its input. The following table shows the output of the three activation functions for different inputs.
| -5 | -1 | 0 | 1 | 5 | |
|---|---|---|---|---|---|
|
乙状结肠 sigmoid |
0.01 0.01 |
0.27 0.27 |
0.5 0.5 |
0.73 0.73 |
0.99 0.99 |
|
双曲正切 tanh |
-1 -1 |
-0.76 -0.76 |
0 0 |
0.76 0.76 |
+1 +1 |
|
整流器 rectifier |
0 0 |
0 0 |
0 0 |
1 1 |
5 5 |
如果你想知道为什么激活函数(当然还有其他函数)会如此繁多,那是因为实践证明,某些激活函数在某些应用中比其他激活函数更合适。由于激活函数对神经元的行为至关重要,因此神经元通常以其激活函数命名。使用阶跃函数的神经元被称为感知器。此外,还有 S 型和 tanh 型神经元。我们也将神经元称为单元,而使用整流器的神经元被称为ReLU(即整流线性单元)。
If you wonder why the proliferation of activation functions (there are also others), it is because it has been found in practice that particular activation functions are more suitable in some applications than others. As the activation function is crucial for the behavior of a neuron, neurons are often named by their activation functions. A neuron that uses the step function is called a Perceptron.4 Then we have sigmoid and tanh neurons. We also call neurons units, and a neuron using the rectifier is called a ReLU, for rectified linear unit.
单个人工神经元就能学会区分两组事物。例如,以下一页上图的数据为例,该图描绘了一组观测值,横轴为 ,纵轴
为 。我们希望构建一个能够区分这两个色块的系统。给定任何物品,系统都能判断该物品属于哪一类。实际上,它会创建一个决策边界,就像下图所示。对于 的任何组合
,系统都能告诉我们该物品属于较亮的组还是较暗的组。
A single artificial neuron can learn to distinguish between two sets of things. For example, take the data in the figure on the top of the next page, portraying a set of observations with two features, , on the horizontal axis, and , on the vertical axis. We want to build a system that will tell apart the two blobs. Given any item, the system will be able to decide whether the item falls in one group or another. In effect, it will create a decision boundary, like in the figure at the bottom. For any combination of , it will tell us whether the item belongs to the lighter or darker group.
神经元只有两个输入。它会接收每一对输入并计算输出。如果我们使用 S 型激活函数,输出将在 0 到 1 之间。我们将大于 0.5 的值归入一类,将其他值归入另一类。这样,神经元将充当分类器,将数据分类到不同的类别中。但它是如何做到这一点的呢?神经元如何才能达到对数据进行分类的程度呢?
The neuron will have only two inputs. It will take each pair and calculate an output. If we are using the sigmoid activation function, the output will be between 0 and 1. We’ll take the values greater than 0.5 to fall into one group and the other values to fall into the other. In this way the neuron will act as a classifier, sorting our data into distinct classes. But how does it do that? How can the neuron get to the point of being able to classify data?
我们的神经元在诞生之初无法识别任何类型的数据;它通过学习来识别它们。它学习的方式是通过示例。整个过程类似于让学生学习,我们给他们一大堆关于某一主题的问题及其解决方案。我们要求学生研究每个问题及其解决方案。如果他们勤奋学习,我们期望在学生解决了一系列问题之后,他们能够找到如何从一个问题推导出答案的方法,甚至能够解决与他们所学问题相关的新问题,但这一次无需借助任何解决方案。
At the moment of its creation, our neuron cannot recognize any kind of data; it learns to recognize them. The way it learns is by example. The whole process is akin to having a student learn something by giving them a large bunch of problems on a subject, along with their solutions. We ask the student to study each problem and its solution. If they are diligent, we expect that after the student has gone through a number of problems, they will have figured out how to get from a problem to its solution and will even be able to solve new problems, related to the ones they studied, but this time without having recourse to any solutions.
当我们这样做时,我们训练计算机寻找解决方案;已解决的示例问题集称为训练数据集。这是监督学习的一个例子 因为这些解决方案会像主管一样引导计算机找到正确答案。监督学习是机器学习最常见的形式,机器学习是一门涉及训练计算机做事的方法的学科。除了监督学习之外,机器学习还包含无监督学习,我们为计算机提供训练数据集,但不提供任何附带的解决方案。无监督学习有许多重要的应用,例如将观测结果分组到不同的聚类中(对于正确的观测结果聚类没有先验解决方案)。但一般而言,监督学习比无监督学习更强大,因为我们在训练期间提供更多信息。这里我们只讨论监督学习。
When we do this, we train the computer to find the solutions; the set of solved example problems is called the training data set. This is an instance of supervised learning because the solutions guide the computer, like a supervisor, toward finding the right answers. Supervised learning is the most common form of machine learning, the entire discipline that deals with methods where we train computers to do things. Apart from supervised learning, machine learning also encompasses unsupervised learning, where we provide the computer with a training data set, but not with any accompanying solutions. There are important applications of unsupervised learning, like, for example, grouping observations into different clusters (there is no a priori solution to what a correct cluster of observations is). In general, though, supervised learning is more powerful than unsupervised learning, as we provide more information during training. We will only deal with supervised learning here.
我们的神经元在诞生之初并不能识别任何类型的数据;它需要学习识别这些数据。学习的方式是通过示例。
At the moment of its creation, our neuron cannot recognize any kind of data; it learns to recognize them. The way it learns is by example.
训练结束后,学生通常会通过一些测试来检验他们对学习材料的掌握程度。同样,在机器学习中,训练结束后,我们会给计算机一个它从未见过的数据集,并让它求解这个测试数据集。然后,我们会根据机器学习系统在测试数据集中解决问题的能力来评估其性能。
After training, the student often passes some tests to see how well they mastered the material. Similarly, in machine learning, after training we give the computer another data set that it has not seen before and ask it to solve this test data set. Then we evaluate the performance of the machine learning system based on how well it manages to solve the problems in the test data set.
在分类任务中,监督学习的训练是通过给神经元网络提供大量的观察值(问题)及其对应的类别(解决方案)来实现的。我们期望神经元能够以某种方式学习如何从观察值推导出对应的类别。那么,如果我们给出对于它以前从未见过的观察结果,它应该能够合理成功地对其进行分类。
In the classification task, training for supervised learning works by giving the neuron network a large number of observations (problems) along with their classes (solutions). We expect that the neuron will somehow learn how to get from an observation to its class. Then if we give it an observation it has not seen before, it should classify it with reasonable success.
神经元对任何输入的行为都由其权重和偏差决定。开始时,我们将它们设置为随机值;神经元什么都不知道,就像一个一无所知的学生。我们以一对的形式给神经元一个输入。神经元会产生一个输出。由于权重和偏差是随机的,输出也将是随机的。然而,对于训练数据集中的每个观测值,我们都知道神经元的正确答案应该是什么。然后,我们可以计算神经元的输出与期望输出的偏差。这被称为损失:衡量神经元对给定输入的错误程度的指标。
The behavior of a neuron for any input is determined by its weights and bias. When we start, we set them at random values; the neuron knows nothing, like a clueless student. We give the neuron one input in the form of a pair. The neuron will produce an output. As we have random weights and bias, the output will also be random. For each of our observations in the training data set, however, we do know what the correct answer from the neuron should be. We can then calculate how far off the neuron’s output is from the desired one. This is called the loss: a measure of how wrong the neuron is for a given input.
例如,如果对于一个输入,神经元的输出为 0.2,而期望输出为 1.0,我们可以通过这两个值之间的差值来计算损失。为了避免符号问题,我们通常将差值的平方作为损失;这里就是。如果期望输出为 0.0,那么损失就是
。无论如何,计算出损失后,我们现在可以调整权重和偏差,使其最小化。
For example, if for an input the neuron produces as output the value 0.2, while the desired output is 1.0, we can calculate the loss by the difference between the two values. To avoid having to deal with signs, we usually take as the loss the square of the difference; here it would be . If the desired output were 0.0, then the loss would be . Be it as it may, having calculated the loss, we can now adjust the weights and bias so as to minimize it.
回到人类学生的例子,每次尝试解决一个练习失败后,我们都会鼓励他们表现得更好。学生会意识到他们必须稍微改变一下方法,尝试下一个例子。如果他们如果失败了,我们就再一次提醒它们。如此反复。直到它们在训练数据集中训练了大量样本后,它们的正确率会越来越高,最终能够处理测试数据集。
Going back to the human student, after each failed attempt to solve an exercise, we nudge them to perform better. The student figures out that they have to change their approach a bit and try with the next example. If they fail, we nudge them again. And again. Until after a lot of examples in the training data set, they will start getting things right more and more, and will be able to tackle the test data set.
神经科学告诉我们,当学生学习时,大脑内部的神经连接会发生变化;神经元之间的一些突触会增强,一些会减弱,还有一些会消失。目前还没有与人工神经元直接对应的模型,但类似的事情也会发生。再次回顾一下,神经元的行为取决于其输入、权重和偏差。我们无法控制输入;它来自环境。但我们可以改变权重和偏差。这就是实际发生的情况。我们更新权重和偏差的值,以使神经元能够最小化其误差。
When a student learns, neuroscience tells us that the wiring inside the brain changes; some synapses between neurons get stronger, some get weaker, and some are dropped. There is no direct equivalent to an artificial neuron, but something similar happens. Recall once more that the behavior of a neuron depends on its input, weights, and bias. We have no control over the input; it comes from the environment. But we can change the weights and biases. And this is what really happens. We update the values of the weights and bias in such a way that the neuron will minimize its errors.
神经元实现这一点的方式是利用它所执行任务的性质。我们希望它提取每个观测值,计算出与某个类别对应的输出,并调整其权重和偏差,以最小化损失。因此,神经元试图解决一个最小化问题。给定一个输入及其产生的输出,问题是:我们如何重新校准权重和偏差,以最小化损失?
The way that the neuron achieves that is by taking advantage of the nature of the task it is called to perform. We want it to take each observation, calculate an output corresponding to a class, and adjust its weights and bias to minimize its loss. So the neuron is trying to solve a minimization problem. Given an input and the output it produces, the problem is, How are we to recalibrate the weights and bias to minimize the loss?
这需要我们转变概念。到目前为止,我们将神经元描述为接受输入并产生输出的实体。从这个角度来看,整个神经元是一个接受输入的大型函数,应用权重,对乘积求和,加上偏差,将结果传入激活函数,最终得到最终输出。但如果我们换个角度思考,我们的输入和输出实际上是给定的(也就是我们的训练数据集),而我们能改变的是权重和偏差。因此,我们可以将整个神经元视为一个函数,其变量是权重和偏差,因为这些才是我们真正能够影响的,并且对于每个输入,我们都希望改变它们,以最小化损失。
This requires a conceptual change of focus. Up to this point we have described a neuron as something that takes some inputs and produces an output. Viewed in this way, the whole neuron is a big function that takes its inputs, applies the weights, sums the products, adds the bias, passes the result through the activation function, and produces the final output. But if we think of it another way, our inputs and outputs are actually given (that is our training data set), while what we can change are the weights and bias. So we can view the whole neuron as a function whose variables are the weights and bias because these are what we can really affect, and for every input we want to change them so as to minimize the loss.
如果我们以一个简单的神经元为例,它只有一个权重并且没有偏差,那么损失和权重之间的关系可能如下一页图片的左侧所示。粗曲线表示对于给定输入,损失与权重的关系。神经元应该调整其权重,以使其达到函数的最小值。对于给定的输入,神经元目前在指示点有损失。不幸的是,神经元不知道可以使损失最小化的理想权重是多少,因为它唯一知道的就是函数在指示点的值;它并不像我们在图中那样拥有有利的视角。神经元只能小幅调整其权重(增加或减少),以使其更接近最小值。
If we take as an illustration a simple neuron, with just one weight and no bias, then the relationship between the loss and weight might be as in the left part of the figure on the next page. The thick curve shows the loss as a function of the weight for a given input. The neuron should adjust its weight so that it reaches the minimum value of the function. The neuron, for the given input, has currently a loss at the indicated point. Unfortunately, the neuron does not know what is the ideal weight that would minimize the loss, given that the only thing it does know is the value of the function at the indicated point; it is not endowed with a vantage point of view like we have with the figure at our disposal. The neuron may only adjust its weight by a small amount—either increase or decrease it—so that it moves closer to the minimum.
为了确定该做什么,是增加还是减少权重,神经元可以找到当前点的切线。然后它可以计算切线的斜率线;这是与水平轴的夹角,我们也在图中显示了出来。注意,神经元除了能够在局部点进行计算之外,不需要任何特殊能力就能做到这一点。由于角度是顺时针的,所以切线的斜率为负。斜率表示函数的变化率;因此,负斜率表示通过增加权重,损失会减少。神经元由此发现,要减少损失,它必须向右移动。由于斜率为负,而所需的权重变化为正,神经元发现它必须将权重移到正方向——与斜率指示的方向相反。
To find out what to do, whether to increase or decrease the weight, the neuron can find the tangent line at the current point. Then it can calculate the slope of the tangent line; this is the angle with the horizontal axis, which we have also shown in the figure. Note that the neuron can do that without any special capabilities apart from being able to carry out calculations at the local point. The slope of the tangent is negative because the angle is clockwise. The slope shows the rate of change of a function; therefore a negative slope indicates that by increasing the weight, the loss decreases. The neuron thereby discovers that to decrease the loss, it has to move to the right. As the slope is negative and the required change in the weight is positive, the neuron finds that it must move the weight in a positive direction—opposite to what is indicated by the slope.
现在翻到右图。这次神经元位于最小损失函数的右侧。它再次取正切值并计算其斜率。因此,斜率为正。正斜率表示增加权重会导致损失增加。神经元知道,为了最小化损失,必须降低权重。由于斜率为正,而所需的权重变化为负,神经元再次发现它必须朝与斜率指示方向相反的方向移动。
Now turn to the figure on the right. This time the neuron is to the right of the minimum loss. It takes the tangent again and calculates its slope. The angle and therefore slope is positive. A positive slope indicates that by increasing the weight, the loss increases. The neuron then knows that in order to minimize the loss, it has to decrease the weight. As the slope is positive and the required change in the weight is negative, the neuron finds again that it must move in the opposite direction than that indicated by the slope.
因此,在这两种情况下,规则都是相同的:神经元计算斜率,并沿与斜率相反的方向更新权重。这些在微积分中可能看起来很熟悉。函数在某一点的斜率就是它的导数。为了减少损失,我们需要对权重进行一个与损失导数相反的小幅调整。
In both cases, then, the rule is the same: the neuron calculates the slope and updates the weight in the opposite direction from the slope. All this might look familiar from calculus. The slope of a function at a point is its derivative. To decrease the loss, we need to change the weight by a small amount that is opposite to the derivative of the loss.
现在,神经元通常不只有一个权重,而是有多个权重,并且还有一个偏差。为了找到如何调整每个权重和偏差的方法,神经元会像我们之前描述的单个权重那样进行操作。用数学术语来说,它会计算损失函数对每个权重和偏差的偏导数。对于n 个权重和一个偏差,它们总共是偏导数。包含函数所有偏导数的向量称为其梯度。梯度相当于多变量函数的斜率;它表示我们必须沿着哪个方向移动才能增加函数的值。要减少函数值,则朝相反的方向移动。因此,为了减少损失,神经元会更新每个权重和偏差。与形成梯度的偏导数所指示的方向相反。5
Now a neuron does not usually have a single weight but rather has many, and also has a bias. To find out how to adjust each individual weight and the bias, the neuron proceeds like we described for the single weight. In mathematical terms, it calculates the so-called partial derivative of the loss with respect to each individual weight and bias. For n weights and a bias, that will be partial derivatives in total. A vector containing all the partial derivatives of a function is called its gradient. The gradient is the equivalent of the slope when we have multivariable functions; it shows the direction along which we have to move to increase the value of the function. To decrease it, we move in the opposite direction. Thus to decrease the loss, the neuron updates each weight and the bias in the opposite direction than the one indicated by the partial derivatives forming its gradient.5
计算并非真正通过绘制切线和测量角度来完成。有一些有效的方法可以找到偏导数和梯度,但我们无需深入探讨细节。重要的是,我们有一个明确定义的方法来调整权重和偏差,以改善神经元的结果。有了这个,学习过程可以用以下算法来描述:
The calculations are not really performed by drawing tangents and measuring angles. There are efficient ways to find the partial derivatives and gradient, but we don’t need to get into the details. What is important is that we have a well-defined way to adjust the weights and bias to improve the results of the neuron. With this at hand, the learning process can be described by the following algorithm:
一旦我们通过遍历训练数据集中的所有数据完成了一次训练,我们就说我们已经完成了一个epoch。通常我们不会就此止步。我们会重复整个过程若干个 epoch;就好像学生在学习了所有学习材料后,又从头开始。我们期望他们下一次会做得更好,因为这一次他们不是从零开始——他们并非完全无知——他们已经从上一个 epoch 中学到了一些知识。
Once we have completed a training by going through all the data in the training data set, we say that we have completed an epoch. Usually we do not leave it at this. We repeat the whole process for a number of epochs; it is as if the student, after going through all the study material, started all over again. We expect that the next time they’ll do better, as this time they do not start from zero—they are not completely clueless—having already learned something from the previous epoch.
我们通过在训练方案中添加时期来重复训练的次数越多,我们对训练数据的掌握就越好。但训练过多也可能有害。反复学习同一组问题的学生可能会死记硬背地学习解决问题,而并不真正知道如何解决以前没有遇到过的任何其他问题。我们经常看到这种情况,一个看似准备充分的学生在考试中惨败。在机器学习中,当我们在训练数据集上训练计算机时,我们说它与数据拟合。过多的训练会导致所谓的过度拟合:训练数据集表现优异,而测试数据集表现不佳。
The more we repeat the training by adding epochs into our training regime, the better we get with the training data. But too much training can be a bad thing. A student who studies again and again the same set of problems will probably learn to solve them by rote—without really knowing how to solve any other problems that they have not encountered before. We see that happening when a seemingly well-prepared student fails abysmally in the exams. In machine learning, when we train the computer on a training data set, we say that it fits the data. Too much training results in what is called overfitting: excellent performance with the training data set, and bad performance with the test data set.
可以证明,按照该算法,神经元可以学习对任何线性可分的数据进行分类。如果我们的数据是二维的(就像我们的例子一样),那么这意味着它们应该可以通过一条直线分离。如果我们的数据具有更多特征,而不仅仅是,则该原理具有普遍性。对于三维(即三个输入
),如果数据可以通过三维空间中的一个简单的平面分离,则它们就是线性可分的。对于更多维度,我们将线和平面的等价物称为超平面。
It can be proven that following this algorithm, a neuron can learn to classify any data that are linearly separable. If our data have two dimensions (like our example), then that means that they should be separable by a straight line. If our data have more features, not just , the principle is generalized. For three dimensions—that is, three inputs —the data are linearly separable if they can be separated by a simple plane in the three-dimensional space. For more dimensions, we call the equivalent of the line and plane a hyperplane.
训练结束时,我们的神经元已经学会了如何分离数据。“学会”意味着它找到了正确的权重和偏差,就像我们描述的那样:它从随机值开始,然后逐渐更新它们。最小化损失。回想一下图中的两个斑点,神经元学会了用决策边界来区分它们。我们从左下方的神经元,到右边的神经元,你可以看到它的参数的最终值。
At the end of the training, our neuron has learned to separate the data. “Learned” means that it has found the right weights and bias, in the way we described: it started out with random values and then gradually updated them, minimizing the loss. Recall the figure with the two blobs, which the neuron learned to separate with a decision boundary. We got from the neuron below at the left, to the neuron at the right, where you can see the final values of its parameters.
这并非总是如此。单个神经元单独运作时只能执行某些任务,例如这种线性可分数据的分类。为了处理更复杂的任务,我们需要从单个人工神经元转向神经元网络。
That does not always happen. A single neuron, acting alone, can only perform certain tasks, like this classification of linearly separable data. To handle more complicated tasks, we need to move from a lone artificial neuron to networks of neurons.
就像生物神经网络一样,我们可以用相互连接的神经元构建人工神经网络。一个神经元的输入信号可以连接到其他神经元的输出,而它的输出信号也可以连接到其他神经元的输入。通过这种方式,我们可以创建如下神经网络:
As in biological neural networks, we can build artificial neural networks out of interconnected neurons. The input signals of a neuron can be connected to the outputs of other neurons, and its output signal can be connected to the inputs of other neurons. In this way we can create neural networks like this one:
这个人工神经网络的神经元是分层排列的。这在实际应用中很常见:我们构建的许多神经网络都是由多层神经元构成的,每一层都紧挨着前一层。我们还让一层的所有神经元都与下一层的所有神经元连接,从左到右。这虽然不是必需的,但也很常见。当我们将层以这种方式连接时,我们称之为“密集连接”。
This artificial neural network has its neurons arranged in layers. This is often done in practice: many neural networks that we construct are made of layers of neurons, with each layer stacked next to a previous one. We have also made all the neurons on one layer connect to all the neurons on the next layer, going from left to right. This, again, is common, although not necessary. When we have layers connected like that, we call them densely connected.
虽然第一层与前一层没有连接,但最后一层的输出同样也不与后一层连接。最后一层的输出是整个网络的输出;它将提供我们想要计算的值。
While the first layer is not connected to any previous one, the output of the last layer is similarly not connected to any following layer. The output of the last layer is the output of the whole network; it will provide the values that we want it to calculate.
让我们回到分类任务。我们现在的问题是区分两组数据,如下页上图所示。这些数据分布在同心圆中。人类很容易就能看出它们属于两个不同的组。同样明显的是,它们不是线性可分的:没有直线可以区分这两类。我们希望创建一个能够区分这两组的神经网络,以便它能够告诉我们未来的任何观察结果将属于哪个组。这就是你在下图中看到的。对于任何在浅色背景下的观察结果,神经网络会识别出它属于一个组;对于任何在深色背景下的观察结果,它会告诉我们它属于另一个组。
Let us return to a classification task. Our problem now is to pick apart two sets of data, shown in the figure on the top of the next page. The data fall into concentric circles. It is clear to a human that they belong to two distinct groups. It is also clear that they are not linearly separable: no straight line can separate the two classes. We want to create a neural network that will be able to tell the two groups apart so that it will tell us in which group any future observation will belong. This is what you see in the figure at the bottom. For any observation on the light background, the neural network will recognize that it belongs to one group; for any observation on the dark background, it will tell us that it belongs to the other group.
为了达到下图所示的结果,我们逐层构建了一个网络。我们在输入层放置两个神经元,每个神经元对应一个数据坐标。我们添加一个包含四个神经元的层,并与输入层紧密连接。由于该层不与输入或输出连接,因此它是一个隐藏层。我们添加另一个包含两个神经元的隐藏层,并与第一个隐藏层紧密连接。最后,我们用一个包含一个神经元的输出层来完成网络,该输出层与最后一个隐藏层紧密连接。所有神经元使用 tanh 激活函数。输出神经元将产生一个介于和之间的值
,表明它认为数据属于其中一类。我们将根据该值是否超过 0.0,将其转换为二元决策,即“是”或“否”。神经网络如下所示:
To achieve the results that we see in the lower figure, we build a network layer by layer. We put two neurons on the input layer, one for each coordinate of our data. We add one layer with four neurons, densely connected to the input layer. Because this layer is not connected to the input or output, it is a hidden layer. We add another hidden layer with two neurons, densely connected to the first hidden layer. We finish the network with an output layer of one neuron, densely connected to the last hidden layer. All the neurons use the tanh activation function. The output neuron will produce a value between and , displaying its belief that the data fall in one or the other group. We’ll take that value and turn it into a binary decision, yes or no, depending on whether it exceeds 0.0 or not. This is what the neural network looks like:
一开始,神经网络一无所知,也没有进行任何调整;我们从随机的权重和偏差开始。这就是神经网络世界中“无知”的含义。然后,我们给神经网络一个来自数据的观测值,也就是一组坐标。和
坐标将放在输入层。两个神经元都接受
和
值,并将它们作为输出传递到第一个隐藏层。所有四个神经元该层计算其输出,并将其发送到第二个隐藏层。该层的神经元将其自身的输出发送到输出层的神经元,从而产生神经网络的最终输出值。随着计算逐层进行,神经网络将神经元的结果从输入层向前传播到输出层:
In the beginning, the neural network knows nothing, and no adjustment has taken place; we start with random weights and biases. This is what ignorance means in the neural network world. Then we give the neural network an observation from our data—that is, a set of coordinates. The and coordinates will go on the input layer. Both neurons take the and values and they pass them as their output to the first hidden layer. All four neurons of that layer calculate their output, which in their turn, they send to the second hidden layer. The neurons on that layer send their own output to the neuron on the output layer, which produces the final output value of the neural network. As the calculations proceed from layer to layer, the neural network propagates the results of the neurons forward, from the input to the output layer:
一旦到达输出层,我们就会像计算单个神经元一样计算损失。然后,我们不仅要调整单个神经元的权重和偏差,还要调整网络中的所有神经元,以最小化损失。
Once we reach the output layer, we calculate the loss, as we did with the single neuron. And then we want to adjust the weights and bias of not just one neuron but rather all the neurons in the network so as to minimize the loss.
事实证明,从输出层到输入层,反过来也可以做到这一点。一旦我们知道了损失,我们就可以更新权重,并且输出层神经元的偏差(这里我们只有一个神经元,但并非总是如此)。更新完输出层的神经元后,我们可以更新其前一层(最后一个隐藏层)的神经元的权重和偏差。完成之后,我们可以更新其前一层(倒数第二个隐藏层)的权重和偏差。依此类推,直到到达输入层:
It turns out that it is possible to do that by going in the opposite direction, from the output to the input layer. Once we know the loss, we can update the weights and biases of the neurons on the output layer (here we have just a single neuron, but this need is not always so). Having updated the neurons on the output layer, we can update the weights and biases of the neurons on the layer before that—the last hidden layer. Having done that, we can update the weights and biases of the layer before that—the one-but-last hidden layer. And so on, until we reach the input layer:
神经元权重和偏差的更新方式与单个神经元的更新方式类似。同样,更新是基于数学导数计算的。你可以将整个神经网络想象成一个巨大的函数,其变量是所有神经元的权重和偏差。然后,我们可以计算每个权重和偏差对损失的导数,并使用该导数来更新神经元。由此,我们了解了神经网络学习过程的核心:反向传播算法。6
The way the weights and biases of the neurons are updated is similar to the way a single neuron is updated. Again, the updates are calculated based on mathematical derivatives. You can think of the whole neural network as an enormous function whose variables are the weights and biases of all the neurons. Then we can calculate the derivative of each and every weight and bias with respect to the loss, and use that derivative to update the neuron. With this we arrive at the heart of the learning process in neural networks: the backpropagation algorithm.6
利用反向传播算法,我们可以构建复杂的神经网络,并训练它们执行不同的任务。深度学习系统的构建模块很简单。它们是人工神经元,计算能力有限:接受输入、乘以权重、求和、添加偏差,并对结果值应用激活函数。它们的能力源于以特殊方式连接大量神经元,由此产生的网络可以被训练来执行我们期望它们执行的任务。
Using the backpropagation algorithm, we can build complex neural networks and train them to perform different tasks. The building blocks of deep learning systems are simple. They are artificial neurons, with their limited computational capabilities: taking inputs, multiplying by weights, summing, adding a bias, and applying an activation function on the resulting value. Their power derives from connecting lots and lots of them in special ways, where the resulting networks can be trained to perform the task that we want them to perform.
为了使讨论更加具体,我们假设我们想要建立一个识别物品的神经网络图像中显示的服装,所以这将是一个图像识别任务。神经网络在这方面表现得非常出色。
To render the discussion more concrete, let us assume that we want to build a neural network that recognizes items of clothing displayed in images, so this is going to be an image recognition task. Neural networks have been found to be exceptionally good at this.
每幅图像都是一张小照片,尺寸为。我们的训练数据集包含 60,000 幅图像,测试数据集包含 10,000 幅图像;我们将使用其中 60,000 幅图像来训练神经网络,另 10,000 幅图像用于评估其学习效果。以下是一张示例图像,我们为其添加了坐标轴和网格,以便于后续讨论:7
Each image will be a small photo, of dimensions . Our training data set consists of 60,000 images, and our test data set consists of 10,000 images; we’ll use 60,000 images for training the neural network, and another 10,000 images for evaluating how well it learned. Here is an example image, on which we have added axes and a grid to help the discussion that follows:7
图像被分解成不同的小块,因为这是我们处理数字图像的方式。我们将整幅图像视为一个矩形区域,并将其分成许多小块,每个小块被赋予一个从 0 到 255 的整数值,对应一种灰度,其中 0 表示全白,255 表示全黑。上图实际上就是下一页的矩阵。
The image is broken into small distinct parts because that is how we handle images digitally. Taking the whole image as a rectangular plot, we divide it into small patches, of them, and each patch is given an integer value from 0 to 255, corresponding to a shade of gray, with 0 being completely white and 255 being completely black. The above image is actually the matrix on the following page.
实际上,神经网络通常要求我们将其输入缩放到一个较小的值域,例如 0 到 1 之间,否则它们可能无法正常工作;你可能会认为较大的输入值会导致神经元误入歧途。这意味着在使用此矩阵之前,我们需要将每个单元除以 255,但在后续讨论中我们将忽略这一点。
In reality, neural networks require that we usually scale their inputs to a small range of values, such as between 0 and 1, otherwise they may not work well; you may think of it as having large input values that lead neurons astray. That means that before using this matrix we would divide each cell by 255, but we’ll ignore this in the rest of the discussion.
不同的服装可能属于十个不同的类别,如下表所示。对于计算机来说,类别只是不同的数字,我们称之为标签:
The different items of clothing may belong to ten different classes, which you can see in the table below. To a computer, the classes are just different numbers, which we call labels:
| 标签 | 班级 | 标签 | 班级 |
|---|---|---|---|
|
0 0 |
T恤/上衣 T-shirt/top |
5 5 |
檀香 Sandal |
|
1 1 |
裤子 Trouser |
6 6 |
衬衫 Shirt |
|
2 2 |
套衫 Pullover |
7 7 |
运动鞋 Sneaker |
|
3 3 |
裙子 Dress |
8 8 |
包 Bag |
|
4 4 |
外套 Coat |
9 9 |
踝靴 Ankle boot |
下图展示了每种服装的随机样本,共十件。正如您所见,这些图片种类繁多,并非所有图片都能完美地展现每个特定服装类别。这使得问题变得更加有趣。我们想要创建一个神经网络,将类似这样的图片作为输入,并输出一个输出,告诉我们它认为输入属于哪种类型的图片。
In the following figure, we show a random sample of ten items from each kind of clothing. There is quite a variety in the images, as you can see, and not all of them are picture-perfect examples of each particular clothing class. That makes the problem somewhat more interesting. We want to create a neural network that takes as its input images like these and provides an output that tells us what kind of image it believes its input is.
同样,我们将分层构建神经网络。第一层包含输入神经元,共有 784 个神经元。每个神经元都从图像中的单个块中获取单个输入,并输出其输入的值。如果图像是踝靴,则第一个神经元的输入将为左上角块中的值 0,并将该值输出。其余神经元将按行顺序(从上到下、从左到右)获取块的值。靴子后跟右端(倒数第四行、右数第三列)值为 58 的块将获取该值 58 并将其复制到输出。由于神经网络的行列数是从上到左计算的,因此该神经元位于从上到下第 25 行、从左到右第 26 列,因此它就是输入神经元编号。
Again, we’ll build our neural network in layers. The first layer, comprising the input neurons, will have 784 neurons. Each one of them will take a single input, from a single patch in the image, and will simply output the value that it gets in its input. If the image is the ankle boot, the first neuron will get the value in the top-left patch, a 0, in its input, and it will output that 0. The rest of the neurons will get the values of the patches proceeding row wise, from top to bottom, left to right. The patch with the value 58, at the right end of the heel of the boot (the fourth row from the bottom, and the third column from the right) will get this 58 and copy it on its output. As rows and columns are counted in the neural network from the top and left, this neuron is in the twenty-fifth row from the top and twenty-sixth column from the left, making it the input neuron number .
下一层将与输入层紧密连接。它由 128 个 ReLU 神经元组成。该层不直接连接到输入图像(输入层直接连接),也不会直接连接到输出(我们将为此添加另一个层)。因此,它是一个隐藏层,因为我们无法从神经网络外部观察到它。由于是紧密连接的,这将导致输入层和隐藏层之间有大量的连接。隐藏层上的每个神经元都将连接到输入层上所有神经元的输出。每个神经元有 784 个输入连接,总共有连接数。
The next layer will be densely connected to the input layer. It will consist of 128 ReLU neurons. This layer is not directly connected to the input images (the input layer is) and will not be directly connected to the output (we’ll add another layer for that). Therefore it is a hidden layer, as we cannot observe it from the outside of the neural network. Being densely connected, this will result in a large number of connections between the input and hidden layer. Each neuron on the hidden layer will be connected to the outputs of all neurons on the input layer. There will be 784 input connections per neuron, for a total of connections.
我们将添加另一个最后一层,它将包含输出神经元,用于传递神经网络的结果。它将包含 10 个神经元,每个类别一个。每个输出神经元将连接到隐藏层的所有神经元,总共连接数为 个。神经网络中所有层之间的连接总数为
。最终的神经网络工作以示意图的形式呈现,类似于下一页的示意图。由于不可能将所有节点和边都放进去,您可以看到虚线框代表输入层和隐藏层上的大部分节点;第一个框中有 780 个节点,第二个框中有 124 个节点。我们还折叠了指向框内各个节点的箭头。
We will add another, last layer, which will contain the output neurons that will carry the results of the neural network. This will contain 10 neurons, one for each class. Each output neuron will be connected to all the neurons of the hidden layer, for a total of connections. The grand total of all the connections between all the layers in the neural network will be . The resulting neural work will look, in schematic form, like the one on the next page. As it is impossible to fit all the nodes and edges, you can see dotted boxes standing for the bulk of nodes on the input and hidden layers; there are 780 nodes in the first box and 124 nodes in the second box. We have also collapsed the arrows going to the individual nodes inside the boxes.
我们的神经网络的输出将由 10 个输出组成,分别来自该层上的每个神经元。每个输出神经元代表一个类,其输出表示输入图像属于该类的概率;所有 10 个神经元的概率之和为 1,正如我们处理概率时必然发生的那样。这是另一个激活函数的例子,称为softmax,它将实数向量作为输入,并将其转换为概率分布。让我们看接下来的两个例子。
The output of our neural network will consist of 10 outputs, one from each neuron on the layer. Each output neuron will represent one class, and its output will represent the probability that the input image belongs to this class; the sum of the probabilities of all 10 neurons will be 1, as it must happen when we deal with probabilities. This is an example of yet another activation function, called softmax, which takes as input a vector of real numbers and converts them to a probability distribution. Let’s see the two examples that follow.
在左边的第一个例子中,经过训练后我们在网络的输出中得到了这个:
In the first example, on the left, after training we get this at the output of the network:
| 输出神经元 | 班级 | 可能性 |
|---|---|---|
|
1 1 |
T恤/上衣 T-shirt/top |
0.09 0.09 |
|
2 2 |
裤子 Trouser |
0.03 0.03 |
|
3 3 |
套衫 Pullover |
0.00 0.00 |
|
4 4 |
裙子 Dress |
0.83 0.83 |
|
5 5 |
外套 Coat |
0.00 0.00 |
|
6 6 |
檀香 Sandal |
0.00 0.00 |
|
7 7 |
衬衫 Shirt |
0.04 0.04 |
|
8 8 |
运动鞋 Sneaker |
0.00 0.00 |
|
9 9 |
包 Bag |
0.01 0.01 |
|
10 10 |
踝靴 Ankle boot |
0.00 0.00 |
这意味着神经网络告诉我们,它非常肯定它正在处理一件连衣裙,其概率为 83%,而输入图像是 T 恤/上衣、衬衫或裤子的概率则很小。
That means that the neural network tells us that it is pretty certain it is dealing with a dress, giving it an 83 percent probability, leaving aside small probabilities for the input image being a T-shirt/top, shirt, or trouser.
在右侧的第二个示例中,网络产生:
In the second example, on the right, the network produces:
| 输出神经元 | 班级 | 输出 |
|---|---|---|
|
1 1 |
T恤/上衣 T-shirt/top |
0.00 0.00 |
|
2 2 |
裤子 Trouser |
0.00 0.00 |
|
3 3 |
套衫 Pullover |
0.33 0.33 |
|
4 4 |
裙子 Dress |
0.00 0.00 |
|
5 5 |
外套 Coat |
0.24 0.24 |
|
6 6 |
檀香 Sandal |
0.00 0.00 |
|
7 7 |
衬衫 Shirt |
0.43 0.43 |
|
8 8 |
运动鞋 Sneaker |
0.00 0.00 |
|
9 9 |
包 Bag |
0.00 0.00 |
|
10 10 |
踝靴 Ankle boot |
0.00 0.00 |
神经网络有 43% 的把握认为它正在处理一件衬衫——但它错了;这张照片实际上是一件套头衫(以防你看不出来)。不过,它还是给出了第二高的准确率,33%,认为这张照片是一件套头衫。
The neural network is 43 percent certain that it is dealing with a shirt—and it is wrong; the photo is really a picture of a pullover (in case you couldn’t tell). Still, it did give its second best, at 33 percent, to the image being a pullover.
我们给出了一个网络得出正确答案的例子,以及另一个网络得出了错误的答案。总的来说,如果我们给网络输入大量图像进行识别,包括训练数据集中的所有 60,000 幅图像,我们会发现它在测试数据集的 10,000 幅图像中大约能识别出 86% 的正确率。考虑到这个神经网络虽然比之前的复杂得多,但仍然很简单,这个结果还不错。基于这个基础,我们可以创建更复杂的网络结构,从而获得更好的结果。
We gave one example where the network comes up with the right answer, and another instance where the network comes up with the wrong answer. Overall, if we give the network many images to recognize, all the 60,000 images in our training data set, we’ll find out that it manages to get right about 86 percent of the 10,000 images in the test data set. That is not bad, considering that the neural network, even though it is way more complicated than the previous one, is still a simple one. From this baseline, we can create more complicated network structures that would give us better results.
尽管复杂度有所提升,我们的神经网络的学习方式与识别数据块和同心圆的简单网络相同。训练期间,每个输入都会获得一个输出,我们会将其与期望输出进行比较,从而计算损失。现在的输出不再是一个单一值,而是 10 个值,但原理相同。当神经网络以约 83% 的概率识别一件衬衫时,我们可以将其与理想状态(即 100% 的概率)进行比较。因此,我们有两组输出值:一组是由网络获得的,不同种类的衣服被赋予不同的概率;另一组是我们希望从网络获得的概率,即一组概率,除了一个对应于正确答案的概率(等于 1)外,其他所有概率都为零。在最后一个示例中,与目标值对比的输出如下:
Despite the increased complexity, our neural network learns in the same way as our simpler networks recognizing blobs of data and concentric circles. For each input during training we obtain an output, which we compare to the desired output to calculate the loss. The output now is not a single value but rather 10 values, yet the principle is the same. When the neural network recognizes a shirt with about 83 percent probability, we can compare that with the ideal, which would be to recognize it with 100 percent probability. Therefore we have two sets of output values: the one obtained by the network, with various probabilities assigned to the different kinds of clothes, and what we would like to have gotten from the network, which is a set of probabilities where all of them are zero apart from a single probability, corresponding to the right answer, which is equal to one. In the last example, the output contrasted to the target would be as follows:
| 输出神经元 | 班级 | 输出 | 目标 |
|---|---|---|---|
|
1 1 |
T恤/上衣 T-shirt/top |
0.00 0.00 |
0.00 0.00 |
|
2 2 |
裤子 Trouser |
0.00 0.00 |
0.00 0.00 |
|
3 3 |
套衫 Pullover |
0.33 0.33 |
1.00 1.00 |
|
4 4 |
裙子 Dress |
0.00 0.00 |
0.00 0.00 |
|
5 5 |
外套 Coat |
0.24 0.24 |
0.00 0.00 |
|
6 6 |
檀香 Sandal |
0.00 0.00 |
0.00 0.00 |
|
7 7 |
衬衫 Shirt |
0.43 0.43 |
0.00 0.00 |
|
8 8 |
运动鞋 Sneaker |
0.00 0.00 |
0.00 0.00 |
|
9 9 |
包 Bag |
0.00 0.00 |
0.00 0.00 |
|
10 10 |
踝靴 Ankle boot |
0.00 0.00 |
0.00 0.00 |
我们取最后两列,再次计算损失指标——只是这一次,由于我们没有单个值,所以我们不计算简单的平方差。存在一些指标可以计算此类值集之间的差异。在我们的神经网络中,我们使用了一种称为分类交叉熵的指标,它表示两个概率分布的差异程度。计算损失后,我们更新输出层的神经元。更新后,我们更新隐藏层的神经元。简而言之,我们进行反向传播。
We take the last two columns and we calculate again a loss metric—only this time, as we do not have a single value, we do not calculate a simple squared difference. There exist metrics to calculate the difference between sets of values like these. In our neural network we used one such metric, called categorical cross-entropy, which indicates how much two probability distributions differ. Having calculated the loss, we update the neurons on the output layer. Having updated them, we update the neurons on the hidden layer. In short, we perform backpropagation.
我们对训练数据集中的所有图像(也就是整个epoch)都进行相同的处理。完成后,我们会再重复这个过程,进行下一个epoch。我们重复这个过程,并努力达到一个平衡:足够训练周期数 (epoch) 是为了使神经网络能够从训练数据集中尽可能多地学习,而不会因为训练周期数过多而导致神经网络从训练数据集中学习过多。在学习过程中,神经网络会调整其神经元的权重和偏差,这些神经元的数量非常多。输入层只是将值复制到隐藏层,因此无需对输入神经元进行任何调整。但是,隐藏层有 100,352 个权重,输出层有 1,280 个权重,隐藏层有 128 个偏差,输出层有 10 个偏差,总共有 101,770 个参数。
We go through the same process for all images in our training data set—that is, for a whole epoch. When we are done, we do this all over again for another epoch. We repeat the process while trying to strike a balance: enough epochs so that the neural network will learn as much as possible from the training data set without going into too many epochs where the neural network will learn too much from the training data set. During learning, the network will be adjusting the weights and biases of its neurons, which are a lot. The input layer just copies values to the hidden layer, so no adjustments need to be done to the input neurons, but there are 100,352 weights on the hidden layer, 1,280 weights on the output layer, 128 biases on the hidden layer, and 10 biases on the output layer, for a total of 101,770 parameters.
可以证明的是,尽管神经元本身功能有限,但神经网络可以执行任何能够用算法描述并在计算机上运行的计算任务。因此,计算机能做的,神经网络都能做到。当然,其核心思想是,我们不需要告诉神经网络如何执行任务。我们只需要在使用算法的同时,向它输入样本,让神经网络学习如何执行任务。我们已经看到,反向传播就是这样一种算法。虽然我们将样本限制在分类任务上,但神经网络可以应用于各种不同的任务。它们可以预测神经网络能够识别目标量(例如信用评分)的值,进行语言间翻译,以及理解和生成语音;并在围棋比赛中击败人类冠军,并在此过程中展示了这项古老游戏的全新策略,令专家们感到困惑。它们甚至学会了如何在不了解规则的情况下下围棋,无需访问先前对局的数据库,然后像神经网络在与自己对弈一样进行学习。8
It can be proven that even though a neuron on its own cannot do much, a neural network can perform any computational task that can be described algorithmically and run on a computer. Therefore there is nothing that a computer can do that a neural network could not do. The whole idea, of course, is that we do not need to tell the neural network exactly how to perform a task. We only need to feed it with examples while using an algorithm to make the neural network learn how to perform the task. We saw that backpropagation is such an algorithm. Although we limited our examples to classification, neural networks can be applied to all sorts of different tasks. They can predict the values of a target quantity (for instance, credit scoring), translate between languages as well as understand and generate speech; and beat human champions in the game of Go, in the process baffling experts by demonstrating completely new strategies of playing a centuries-old game. They have even learned how to play the game of Go starting with just a knowledge of the rules, without access to a library of previously played games, and then proceeding to learn as if the neural network were playing games against itself.8
如今,神经网络的成功应用比比皆是,但其原理却并非新鲜事物。感知器发明于 20 世纪 50 年代,反向传播算法也已有 30 多年的历史。在此期间,神经网络的兴起和衰落,人们对其潜力的热情也随之消退。过去几年真正发生改变的是我们构建真正大型神经网络的能力。这得益于制造专用计算机芯片的进步,这些芯片可以高效地执行神经元执行的计算。如果将神经网络的所有神经元想象成排列在计算机内存中的阵列,那么所有必要的计算都可以通过对庞大的数字矩阵进行运算来完成。神经元计算其输入加权乘积的和;如果你还记得上一章关于 PageRank 的讨论,乘积的和正是矩阵乘法的本质。
Today, successful applications of neural networks abound, yet the principles are not new. The Perceptron was invented in the 1950s, and the backpropagation algorithm is more than 30 years old. In this period, neural networks came and went out of fashion, with enthusiasm for their potential ebbing and flowing. What has really changed in the last few years is our capability to build really big neural networks. This has been achieved thanks to the advances in manufacturing specialized computer chips that can perform the calculations executed by neurons efficiently. If you picture all the neurons of a neural network arranged inside a computer’s memory, then all the required computations can be carried out by operations on vast matrices of numbers. A neuron calculates the sums of the weighted products of its inputs; if you recall the discussion on PageRank in the previous chapter, the sum of the products is the essence of matrix multiplication.
事实证明,图形处理单元( GPU ) 非常适合此目的。GPU 是专门设计用于在计算机内部创建和处理图像的计算机芯片;该术语建立在中央处理器( CPU ) 的基础上,CPU 是执行计算机内部程序指令的芯片。GPU 旨在执行计算机图形指令。计算机图形的生成和处理需要对大型矩阵进行数值运算;计算机生成的场景就是一个巨大的数字矩阵(想一想鞋子)。GPU 是游戏机的主力。这种在数小时的娱乐中让人类智力停滞的技术也用于提升机器智能。
It has turned out that graphics processing units (GPUs) are perfectly suited for this. GPUs are computer chips that are specially designed to create and manipulate images inside a computer; the term builds on central processing units (CPUs), the chip that carries out the instructions of a program inside a computer. GPUs are built to carry out instructions for computer graphics. The generation and processing of computer graphics requires numerical operations on big matrices; a computer-generated scene is a big matrix of numbers (think of the shoe). GPUs are the workhorses of game consoles. The same technology that arrests human intelligence in hours of diversion is also used to advance machine intelligence.
我们从最简单的神经网络开始,它由一个神经元组成。然后我们添加了几个神经元,之后又添加了几百个神经元。尽管如此,我们创建的图像识别神经网络规模并不大,其架构也并不复杂。我们只是一层一层地添加了神经元。深度学习领域的研究人员在神经网络设计方面取得了长足的进步。这些架构可能包含数十层。这些层的几何形状不必像我们这里展示的那样,只是简单的一维神经元集合。例如,层内的神经元可以堆叠在二维画布状结构上。此外,不必将每一层都与前一层紧密连接;其他连接模式包括不可能。也不必将某一层的输出简单地连接到下一层的输入。例如,我们可以在非连续层之间建立连接。我们可以将各层捆绑在一起,将其视为模块,并将它们与包含其他层的模块组合,以形成越来越复杂的配置。如今,我们拥有种类繁多的神经网络架构,因此某些架构可以很好地适用于特定的任务。
We started with the simplest possible neural network, consisting of a single neuron. Then we added a few neurons, and then we added a few more hundreds. Still, the image recognition neural network that we created is by no means a big one. Nor is its architecture complicated. We just added layer on layer of neurons. Researchers in the field of deep learning have made big strides in devising neural network designs. These architectures may comprise dozens of layers. The geometry of these layers need not be a simple one-dimensional set of neurons, like the ones we have here. For example, neurons inside a layer may be stacked on two-dimensional canvas-like structures. Moreover, it is not necessary to have each layer densely connected to the one before; other connection patterns are possible. Nor is it necessary to have the outputs of a layer simply connected to the inputs of the next layer. We may, for instance, have connections between non-consecutive layers. We may bundle up layers and treat them as modules, combining them with modules containing other layers to form more and more complex configurations. Today we have a menagerie of neural network architectures at our disposal, such that particular architectures are well suited for specific tasks.
所有神经网络架构中,层上的神经元都会在学习过程中更新权重和偏差的值。如果我们反思一下正在发生的事情,就会发现,我们有一组输入,它们在学习过程中对层进行变换。训练停止后,这些层会通过调整参数,以某种方式吸收输入数据所代表的信息。层的权重和偏差配置代表了它接收到的输入。第一个隐藏层与输入层直接接触,对神经网络的输入进行编码。第二个隐藏层对与其直接连接的第一个隐藏层的输出进行编码。随着我们逐渐深入多层网络,每一层都会对前一层接收到的输出进行编码。每个表示都建立在前一个表示的基础上,因此比前一层的抽象程度更高。因此,深度神经网络会学习一个概念的层次结构,不断迈向越来越高的抽象层次。我们所说的深度学习正是在这个意义上。我们指的是一种架构,其中连续的层次代表着更深层次的概念,对应于更高的抽象层次。在图像识别中,多层网络的第一层可以学习识别小的局部模式,例如图像中的边缘。然后,第二层可以学习识别由第一层识别出的模式构建的模式,例如眼睛、鼻子和耳朵。第三层可以学习识别由第二层识别出的模式构建的模式,例如人脸。现在您可以看到,我们用于识别图像的神经网络有些幼稚;我们并没有尝试实现真正的深度学习。通过在抽象之上构建抽象,我们期望我们的网络能够找到人类能够发现的模式,从句子结构到医学图像中的恶性肿瘤迹象,再到识别手写字符,再到检测网络欺诈。
The neurons on the layers in all the neural network architectures update the values of the weights and biases as they learn. If we reflect on what is happening, we can see that we have a set of inputs that transforms the layers during the learning process. Once the training stops, the layers have somehow, via the adjustments in their parameters, taken in the information represented by the input data. The weights and biases configuration of a layer represents the input it has received. The first hidden layer, which comes in direct contact with the input layer, encodes the neural network’s input. The second hidden layer encodes the output of the first hidden layer, to which it is directly connected. As we proceed deeper and deeper into a multilayer network, each layer encodes the output received by the previous layer. Each representation builds on the previous one and therefore is on a higher level of abstraction from the one of the preceding layer. Deep neural networks, then, learn a hierarchy of concepts, proceeding to higher and higher levels of abstraction. It is in this sense that we talk of deep learning. We mean an architecture whereby successive levels represent deeper concepts, corresponding to higher levels of abstraction. In image recognition, the first layer of a multilayer network may learn to recognize small local patterns, such as edges in the image. Then the second layer may learn to recognize patterns that are built from the patterns recognized by the first layer, such as eyes, noses, and ears. The third layer may learn to recognize patterns that are built from the patterns recognized by the second layer, like faces. Now you can see that our neural network for recognizing the images was somewhat naive; we did not try to implement actual deep learning. By building abstractions on abstractions, we expect our network to find patterns that humans find, from structures in sentences, to signs of malignancy in medical images, to recognizing handwritten characters, to detecting online fraud.
然而,你可能会说,这一切都归结于在简单的构建块(人工神经元)上更新简单的值。你说得对。当人们意识到这一点时,有时会感到失望。他们想了解机器学习和深度学习是什么,而答案的简单性却令人失望:看似拥有人类能力的东西,竟然可以简化为一些基本操作。或许我们更愿意寻找更复杂的东西,这无疑会提升我们的自尊心。
Yet, you may say, it all boils down to updating simple values on simple building blocks—the artificial neurons. And you would be correct. When people realize that, sometimes they feel let down. They want to learn what machine and deep learning are, and the simplicity of the answer disappoints: something that appears to have human capabilities can be reduced to fundamentally elementary operations. Perhaps we would prefer to find something more involved, which would not fail to flatter our self-esteem.
然而,我们不应忘记,在科学领域,我们相信自然可以用第一原理来解释,并努力寻找尽可能简单的原理。这并不排除由简单规则和基本构件衍生出复杂的结构和行为。人工神经元比生物神经元简单得多,即使生物神经元的工作原理可以用简单的模型来解释,但正是由于大量相互连接的生物神经元,我们所知的智能才得以产生。
We should not forget, however, that in science we believe that nature can be explained from first principles, and try to find such principles that are as simple as possible. That does not preclude complex structures and behaviors arising out of simple rules and building blocks. Artificial neurons are much simpler than biological ones, and even if the workings of biological neurons can be explained in simple models, it is thanks to the vast number of interconnected biological neurons that intelligence, as we know it, can arise.
这有助于我们正确看待一些事情。诚然,人工神经网络的潜力巨大。然而,要使它们发挥作用,需要惊人的人类创造力和巨大的工程努力。我们在这里仅仅触及了皮毛。例如,以反向传播为例。这是神经网络背后的基本算法,使我们能够高效地执行本质上是寻找数学导数的过程。研究人员一直在忙于设计高效的计算技术,例如自动微分,这是一种已被广泛采用的计算导数的机制。或者以计算神经网络参数变化的具体方法为例。各种不同的优化器已经开发出来,使我们能够部署越来越大的网络,同时效率也越来越高。谈到底层硬件,硬件工程师正在设计更好的以及更先进的芯片,能够以更低的计算能力更快地运行更多神经计算。纵观网络架构,新的神经网络架构不断涌现,这些架构在现有架构的基础上得到了改进。这为研究和实验提供了温床,甚至包括构建用于设计其他神经网络的神经网络的努力。因此,每当你看到新闻报道某个神经网络取得了新的成就时,都要向那些辛勤工作的人们致敬,正是他们成就了这一切。9
This helps put some things into perspective. True, artificial neural networks can be uncanny in their potential. In order to make them work, however, an amazing amount of human creativity and terrific engineering effort is required. We have only scratched the surface in our account here. For instance, take backpropagation. That is the fundamental algorithm behind neural networks, allowing us to perform efficiently what is at heart a process of finding mathematical derivatives. Researchers have been busy devising efficient calculation techniques, such as automatic differentiation, a mechanism for calculating derivatives that has been widely adopted. Or take the exact way that changes in the neural network parameters are calculated. Various different optimizers have been developed, allowing us to deploy bigger and bigger networks that are at the same time more and more efficient. Turning to the underlying hardware, hardware engineers are designing better and better chips to run more and more neural computations faster while using less computing power. Looking at network architectures, new neural network architectures are proposed that improve on existing ones. This is a hotbed of research and experimentation, and even encompasses efforts to build neural networks that design other neural networks. So every time you see a news report that a neural network has reached a new achievement, doff your hat to the hardworking people who made this possible.9
人工神经元比生物神经元简单得多,即使可以解释生物神经元的工作原理……,也正是由于大量相互连接的生物神经元,智能才得以产生。
Artificial neurons are much simpler than biological ones, and even if the workings of biological neurons can be explained . . . , it is thanks to the vast number of interconnected biological neurons that intelligence . . . can arise.
2019 年 7 月 15 日,英格兰银行行长马克·卡尼公布了新版 50 英镑纸币的设计方案,预计将在两年后开始流通。英格兰银行于 2018 年决定用新版纸币纪念一位科学人物,并启动了为期六周的公众提名期。最终,共收到 227,299 份提名,选出 989 位符合条件的科学人物。纸币人物咨询委员会从中遴选出 12 个候选名单。之后,行长做出最终决定,选择了艾伦·图灵。他评论道:“艾伦·图灵是一位杰出的数学家,他的工作对我们今天的生活方式产生了巨大的影响。作为计算机科学和人工智能之父,同时也是一位战争英雄,艾伦·图灵的贡献深远而具有开创性。图灵是一位巨人,如今许多人都站在他的肩膀上。” 1
On July 15, 2019, Mark Carney, the Bank of England governor, presented the design of the new £50 note, expected to enter circulation about two years later. The Bank of England had decided in 2018 to celebrate a character from science with the new banknote and opened a six-week public nomination period for the selection. It received a total of 227,299 nominations for 989 eligible characters. From this, the Banknote Character Advisory Committee decided on a short list of 12 options. Then the governor made the final decision, selecting Alan Turing. He commented, “Alan Turing was an outstanding mathematician whose work has had an enormous impact on how we live today. As the father of computer science and artificial intelligence, as well as war hero, Alan Turing’s contributions were far ranging and path breaking. Turing is a giant on whose shoulders so many now stand.”1
图灵 (1912-1954) 是一位天才,他探索了计算的极限和本质,预见了具有智能行为的机器的兴起,努力解决机器是否能够思考的问题,为数学生物学和形态发生机制做出了贡献,并在第二次世界大战期间对德国加密信息的密码分析中发挥了关键作用(他的(他的贡献几十年来一直是个秘密)。悲剧的是,图灵自杀身亡。1952年,他因同性恋被捕入狱,当时在英国,同性恋是犯罪行为,并被迫接受激素治疗。2013年,官方发布了赦免令。他出现在新版纸币上,这在几十年前是不可想象的。2
Turing (1912–1954) was a genius who explored the limits and nature of computation, foresaw the rise of machines that would display intelligent behavior, grappled with the question of whether machines could think, contributed to mathematical biology and mechanisms of morphogenesis, and played a crucial role in the cryptanalysis of encrypted German messages during World War II (his contribution remained a secret for decades). In a tragic turn of events, Turing died by suicide. He had been arrested and convicted in 1952 for homosexuality, which was criminal in the United Kingdom at the time, and compelled to get hormonal treatment. An official pardon was issued in 2013. His appearance on the new note is a form of rehabilitation that would have been unthinkable a few decades back.2
本书始终将算法描述为由简单的步骤组成,这些步骤足够基础,可以用纸笔完成。鉴于我们是在计算机程序中实现算法,那么“算法究竟是什么”这个问题将有助于我们理解什么才是真正可以计算的。这需要我们更深入地探究这些简单步骤的本质。毕竟,小学生用纸笔能做的事情和大学毕业生能做的事情是不一样的。我们能否精确地定义算法由哪些步骤组成?图灵在数字计算机被制造出来之前就给出了答案。为了回答计算机(任何计算机)能做什么这个问题,他于 1936 年提出了一种模型机。图灵机是一个简单的装置。它由以下部分组成:3
Throughout this book we have been describing algorithms as consisting of simple steps, elementary enough that they can be carried out using a pen and paper. Given that we implement algorithms in computer programs, the question of what really is an algorithm will help us understand what can really be computed. This requires us to dig deeper into the nature of these simple steps. After all, what a primary school student can do with a pen and paper is different than what a college graduate can do. Is it possible to define precisely what kind of steps an algorithm could be made of? Turing offered an answer even before digital computers were built. He proposed a model machine in 1936 in order to answer the question of what a computer, any computer, can do. A Turing machine is a simple contraption. It consists of the following parts:3
有可能精确定义一个算法由哪些步骤组成吗?...[图灵]在 1936 年提出了一种模型机器,以回答计算机(任何计算机)能做什么的问题。
Is it possible to define precisely what kind of steps an algorithm could be made of? . . . [Turing] proposed a model machine in 1936 in order to answer the question of what a computer, any computer, can do.
你可以在下一页的图中看到一台图灵机。4
You can see a Turing machine in the figure on the next page.4
这台图灵机的字母表由 1 和 组成。有限控制表明该机器可以处于七种状态之一,。指令表为每种可能状态各占一行,为每种可能符号各占一列;我们用B代替空白,以便我们能够看到它。当前状态由行表示,扫描到的符号由列表示。指令表中的每个条目都包含一个三元组,描述一次移动或一次破折号,这意味着机器在该行和列的组合中无需执行任何操作。
The alphabet of this particular Turing machine consists of 1 and . The finite control shows that the machine can be in one of seven states, . The instructions table has one row for each possible state, and one column for each possible symbol; we use B to stand in for blank so that we can see it. The current state is indicated by the row, and the scanned symbol by the column. Each entry in the instructions table contains a triplet, describing a move, or a dash, meaning that the machine has nothing to do in this row and column combination.
机器的一次移动由三个动作组成:
A move of the machine consists of three actions:
我们的示例图灵机执行一个算法,当 时计算两个数字a和b的差;否则返回零。此运算称为或真减法,我们记为 a –∙ b。因此,4 –∙ 2 = 2 且 2 –∙ 4 = 0。
Our example Turing machine executes an algorithm that computes the difference of two numbers a and b when ; otherwise, it returns zero. This operation is called monus or proper subtraction, and we write a –∙ b. We have 4 –∙ 2 = 2 and 2 –∙ 4 = 0.
首先,我们将机器的输入放在纸带上。输入是机器字母表中的有限符号串。纸带左侧和右侧的所有其他单元均为空白。在这台图灵机中,输入为。输入代表一元数字系统中的数字 4 和 2 ,以 分隔。
Initially, we place the machine’s input on the tape. The input is a finite string of symbols from the machine’s alphabet. All other cells of the tape, to the left and right of it, are blank. In this Turing machine, the input is . The input represents the numbers four and two in the unary numeral system, separated by.
这台机器的头部位于最左边的输入单元。有限控制点位于q 0状态。然后机器开始工作并执行其动作。如果我们跟踪机器前六次移动的操作,我们会看到它是这样的:
This machine starts with its head on the leftmost input cell. The finite control points at the q0 state. Then the machine starts working and performs its moves. If we follow the machine’s operation for the first six moves, we’ll see that it goes like this:
机器将继续以此方式工作,执行指令表规定的移动。如果我们从更高层次的角度来看,我们会意识到机器执行了一个循环。在每次迭代中,它都会找到最左边的 1 并将其替换为空格。然后它向右搜索。当找到后,它继续向右移动,直到找到 1,并将其变为
。因此,在每次迭代中,机器都会在 的左侧和右侧删除一个 1。
在某个时候,这将不再可能。然后机器将
用空格替换所有符号并终止。纸带上将包含 11,相当于数字 2,周围是空格。为了指示终止,机器进入状态q 6,根据指令表,该状态无事可做,因此它会停止。
The machine will continue working in this way, performing the moves prescribed by the instruction table. If we take a higher-level view, we’ll realize that the machine executes a loop. In each iteration, it finds the leftmost 1 and replaces it with a blank. It then searches right for a . When it finds it, it continues going right until it finds a 1, which it turns into a . Therefore in each iteration, the machine strikes out a 1 on the left and right of . At some point, this will no longer be possible. Then the machine will replace all symbols with blanks and will terminate. The tape will contain 11, equivalent to the number 2, surrounded by blanks. To indicate termination, the machine enters the state q6, where according to the instructions table there is nothing to do, and it stops.
如果我们提供输入,机器将不停地运转,直到磁带上满是空白(相当于 0)为止。如果我们给机器任何输入,包括a 个1,后跟一个星号,然后是b 个1,它将继续移动,直到磁带上留下 1(
如果 )
,否则全是空白。
If we provide as input , the machine will beaver away until it stops with a tape full of blanks, which is equivalent to 0. If we give the machine any input consisting of a ones followed by an asterisk and then b ones, it will follow its moves until it leaves the tape with either ones, if , or otherwise all blanks.
这台图灵机根据其输入并按照其指令表中描述的指令执行计算 monus 运算的算法。这些步骤非常简单,图灵机的头部需要不停地移动才能完成操作。需要移动 21 次才能得出 2 –∙4 = 0,而需要移动 34 次才能得出 4 –∙2 = 2。但这些操作是多么简单啊!任何有点智力的人都能完成它们。步骤的简单性正是关键所在。您不需要任何高级资格来执行图灵机的步骤;您只需要查找表格、在磁带上移动、一次读写一个符号,并跟踪您的状态。就这些。然而,这并不简单,因为对于算法可以由哪些步骤组成这个问题的答案是,它们就是图灵机可以执行的步骤。
This Turing machine executes an algorithm for computing the monus operation based on its input and following the instructions described in its instructions table. The steps are so elementary that the head of the Turing machine scampers around a lot in order to perform the operation. It will take 21 moves to find that 2 –∙ 4 = 0 and 34 moves to find that 4 –∙ 2 = 2. But how simple these moves are! Anybody with a modicum of intelligence can carry them out. The rudimentary nature of the steps is exactly the point. You do not need any advanced qualifications to perform the steps of a Turing machine; you only need to look up a table, move around on a tape, read and write one symbol at a time, and keep track of what your state is. That is all. Yet it is not trivial because the answer to the question of what kind of steps an algorithm could be made of, is that they are the steps that a Turing machine could perform.
在本书中,我们一直在更高层次上描述算法,其步骤也更为复杂。这是为了方便起见,因为图灵机的工作细节层次太低,用它来描述我们的算法会很不方便。但我们所描述的所有算法的所有步骤都可以用正确构造的图灵机的步骤来表示。我们描述了一个简单的图灵机来实现单调运算。对于更复杂的算法,我们需要一台具有更多状态、更大字母表和更大指令表的图灵机。但如果我们愿意,我们仍然可以构建它。
In this book we have been describing algorithms at a higher level, with more complex steps. That is for our convenience because a Turing machine works at such a low level of detail that it would be unwieldy to use it to describe our algorithms. But all the steps of all the algorithms we have depicted could be presented as steps of a properly constructed Turing machine. We have described a simple Turing machine to implement the monus operation. For a more complex algorithm we would need a Turing machine with more states, a bigger alphabet, and a bigger instructions table. But we could still build it, if we wanted.
图灵机的简单性掩盖了它的范围;给定任何算法,我们都可以构建一个图灵机实现它。当计算机运行算法时,任何可由计算机计算的算法都可以由图灵机计算。或者换句话说,我们可以用算法做的任何事,都可以用图灵机做。这是对丘奇-图灵论题的宽泛表述,该论题以图灵和美国数学家阿隆佐·丘奇(1903-1995)的名字命名,丘奇是理论计算机科学的创始人之一。作为一个论题,它尚未被证明,我们不知道它是否可以用数学证明。如果有人设计出某种替代形式的计算来计算图灵机无法计算的东西,那么理论上它可能会遭到推翻。我们认为这不太可能发生。因此,我们将图灵机视为算法概念的形式化描述。5
The simplicity of the Turing machine belies its ambit; given any algorithm, we can construct a Turing machine that implements it. As computers run algorithms, any algorithm that is computable by a computer is computable by a Turing machine. Or in other words, whatever we can do with an algorithm, we can do with a Turing machine. That is a loose rendering of the Church-Turing thesis, named after Turing and the US mathematician Alonzo Church (1903–1995), one of the founders of theoretical computer science. It being a thesis, it is not something that has been proved, and we do not know if it can be proved mathematically. It is theoretically possible that it could be disproved, if somebody devises some alternative form of computation that computes things that a Turing machine cannot compute. We do not believe this is likely to happen. We therefore take the Turing machine to be a formal description of the notion of an algorithm.5
你可以想象任何一台计算机,只要你想,它有多强大。这台计算机的速度会比我们描述的在符号带上操作的图灵机快得多。但它算法计算的一切,图灵机也能计算。你甚至可以想象我们还无法制造的计算机。我们的计算机使用比特,比特只能存在于两种状态:0 和 1。量子计算机使用量子位。当我们检查量子位的状态时,它将是 0 或 1,就像一个比特。然而,当我们不检查量子位时,它可以处于两个二进制状态 0 和 1 的组合中,这被称为叠加。就好像一个量子位既是 0 又是 1,直到我们决定当它决定取这两个值之一时,读取它。这使得量子计算机能够同时表示多个计算状态。量子计算机可以让我们解决传统计算机难以解决的快速问题。遗憾的是,以目前的技术水平建造量子计算机非常困难。即使是量子计算机也无法做到图灵机无法做到的事情。即使它能够比任何现有的传统计算机或任何图灵机更有效地解决某些问题,但它仍然无法解决图灵机无法解决的任何问题。
You can imagine any computer, as powerful as you want it. The computer will be way faster than a Turing machine that operates on a tape of symbols as we have described it. But everything it calculates algorithmically, a Turing machine can calculate too. You can even imagine computers that we have not been able to manufacture yet. Our computers work with bits, which can exist in only two states, 0 and 1. Quantum computers work with qubits. When we examine the state of a qubit, this will be 0 or 1, like a bit. Yet a qubit, when we don’t examine it, can be in a combination, called superposition, of the two binary states 0 and 1. It is as if a qubit is both 0 and 1, until we decide to read it, when it decides to be one of these two values. This allows quantum computers to represent multiple states of computation at once. A quantum computer would allow us to solve fast problems that are not easily solved by classical computers. Unfortunately, building a quantum computer is difficult with the current technology. And even a quantum computer could not do something that a Turing machine cannot do. Even though it would be able to solve some problems more efficiently than any existing classical computer, or any Turing machine for that matter, it still won’t be able to solve any problems that a Turing machine cannot solve.
我们的计算极限是由图灵机决定的。计算机能做到的一切,我们都能用笔和纸在符号带上完成。你在任何数字设备上看到的一切,本质上都是一系列这样的基本符号操作。在自然科学中,我们观察世界,并相信我们能够用基本原理来解释它。在计算领域,情况正好相反。我们拥有基本原理,并相信我们能够用它们创造惊人的成就。
Our computational limits are given by Turing machines. Anything a computer can do, we could really do with a pen and paper, working on a tape of symbols. Everything you see executed on any digital device is, in essence, a series of such elementary symbol manipulations. In the natural sciences, we behold the world and believe that we can explain it using fundamental principles. In computing, it is the other way around. We have our fundamental principles and believe that we can do amazing feats with them.
当图灵提出他的机器作为计算模型时,数字计算机甚至还不存在。但这并没有阻止他探索未来计算机器的能力。当我们思考计算机的局限性时,我们应该还要记住,在这些限制之内,人类的智慧创造了奇迹。计算的局限性并没有限制我们的创造力,让我们能够继续为生活的方方面面开发算法。文字在美索不达米亚发明时,其目的是为了记录,而不是为了创作文学作品。最早的作家可能是会计师,而不是作家,然而威廉·莎士比亚却从如此卑微的起点走出来。谁知道,随着时间的推移,算法会带来什么呢?
When Turing proposed his machine as a model for computation, digital computers did not even exist. That did not prevent him from exploring the capabilities of computing machines that would be created in the future. When we think about the limits of computers, we should also keep in mind that inside these limits, the human intellect has created wonders. The limits of computation have not curtailed our creativity to continue developing algorithms for every aspect of our lives. When writing was invented in Mesopotamia, its purpose was to aid record keeping, not write literature. The first writers were probably accountants, not authors, yet from such humble beginnings emerged William Shakespeare. Who knows what, in time, algorithms will bring.
我们的计算极限是由图灵机决定的。计算机能做到的一切,我们用笔和纸就能做到……你在任何数字设备上看到的所有执行操作……都是一系列这样的基本符号操作。
Our computational limits are given by Turing machines. Anything a computer can do, we could really do with a pen and paper. . . . Everything you see executed on any digital device is . . . a series of such elementary symbol manipulations.
激活(神经元)
activation (neuron)
神经元输出的发射。
The emission of output from a neuron.
激活函数
activation function
根据神经元的输入确定其输出的函数。
A function that determines the output of a neuron based on its input.
无环图
acyclic graph
没有循环的图。
A graph that has no cycle.
邻接矩阵
adjacency matrix
表示图的矩阵。图中的每个顶点对应一行和一列。图中由一条边连接的两个顶点对应的行和列的每个元素的内容为 1;所有其他元素的内容为 0。
A matrix that represents a graph. It has a row and column for each vertex of the graph. Its contents are 1 in each entry whose row and column correspond to two vertices connected by an edge in the graph; all other entries are 0.
算法
algorithm
1. 翻到书的第一页。
1. Go to the first page of the book.
2.阅读当前页面。
2. Read the current page.
3.如果不明白,请转到步骤2。否则,请转到步骤4。
3. If you don’t understand, go to step 2. Otherwise go to step 4.
4. 如果有下一页,则将其设为当前页面并转到步骤 2。否则终止。
4. If there is a next page, make it your current page and go to step 2. Otherwise terminate.
近似
approximation
使用某种算法来解决问题,虽然可能无法找到最优解,但距离最优解也不远。
Solving a problem by using an algorithm that may not find the optimal solution, but one that is not far from it.
自动微分
automatic differentiation
一组以数值方式(而非解析方式)计算函数导数的技术,这需要使用微积分规则来区分函数。
A set of techniques to evaluate the derivative of a function numerically—that is, not analytically, which would entail using the calculus rules for differentiating functions.
反向链接
backlink
指向我们正在访问的网页的链接,以及包含指向我们正在访问的网页的链接的网页。
A link that points to the web page we are visiting, and by extension, the web pages that contain links that point to the web page we are visiting.
反向传播算法
backpropagation algorithm
训练神经网络的基本算法。网络通过将调整从最后一层传播回第一层来修正其配置(权重和偏差)。
A fundamental algorithm for training neural networks. The network corrects its configuration (its weights and biases) by propagating adjustments from the final layer back toward the first layer.
偏见
bias
附加到神经元上的数值,用于控制其激发倾向。
A numerical value attached to a neuron that controls its propensity to fire.
大O
big O
计算复杂度的一种表示法。给定一个算法,其输入大于某个阈值,它给出了该算法完成所需步数的预期上限。我们希望输入大于某个阈值,因为我们感兴趣的是算法在大数据上的行为。算法的大 O 复杂度保证了,对于大数据,算法所需的步数不会超过特定值。例如,复杂度表示,对于大小为n且超过某个阈值的输入,算法完成所需的步数不会超过常数倍。
A notation for computational complexity. Given an algorithm and input greater than some threshold, it gives us an upper bound on the expected number of steps required by the algorithm to complete. We want the input to be larger than some threshold because we are interested in the behavior of an algorithm on large data. The big O complexity for an algorithm gives us a guarantee that for large data, the algorithm will not require more than a particular number of steps. For example, a complexity of means that for input of size n that exceeds some threshold, the algorithm will not take more than a constant multiple of steps to complete.
二分查找
binary search
一种用于有序数据的搜索算法。我们检查搜索空间中间的项。如果它与我们要查找的项匹配,则搜索成功。否则,我们将重复该过程,直到搜索空间的左半部分或右半部分,具体取决于我们是否超出目标范围。
A search algorithm that works on ordered data. We check the item in the middle of the search space. If it matches the one we are looking for, we are fine. Otherwise, we repeat the procedure to the left or right half, depending on whether we have overshot or undershot our target.
少量
bit
计算机上存储信息的基本单位。“位”可以取两个值:0 或 1。“位”一词源于二进制数字。
The basic unit of information stored on a computer. A bit can take one of two values, 0 or 1. The word bit comes from binary digit.
漏洞
bug
程序中的错误。托马斯·爱迪生曾用“bug”一词来指代技术故障。在计算机发展的早期,真正的“bug”会进入机器内部,导致机器故障。1947年,在哈佛Mark II计算机内部发现了一只能够造成这种故障的飞蛾。这只飞蛾被保存在机器的日志中,该日志目前收藏于史密森尼美国国家历史博物馆。
An error in a program. The term bug was used by Thomas Edison for a technical fault. In the early days of computing, real bugs would make their way into the machinery, causing them to fail. A moth that did that was found inside the Harvard Mark II computer in 1947. The moth has been preserved in the machine’s logbook, which is part of the collection of the Smithsonian National Museum of American History.
分类交叉熵
categorical cross-entropy
计算两个概率分布之间的差异的损失函数。
A loss function that calculates the difference between two probability distributions.
中央处理器(CPU)
central processing unit (CPU)
执行计算机内部程序指令的芯片。
The chip that carries out the instructions of a program inside a computer.
色度指数
chromatic index
在图着色中,对图的边进行着色所需的最少颜色数。
In graph coloring, the minimum number of colors required to color the edges of the graph.
丘奇-图灵论题
Church-Turing thesis
假设所有能被算法计算的东西都能被图灵机计算。
The hypothesis that everything that can be computed by an algorithm can be computed by a Turing machine.
分类器
classifier
将观察结果归类为多个可能类别之一的程序。
A program that classifies an observation in one out of a number of possible classes.
复杂性(计算复杂性)
complexity (computational complexity)
算法运行所需的时间。该时间以完成所需基本计算步骤的顺序表示。
The time required for an algorithm to run. The time is expressed on the order of elementary computational steps required to complete.
复杂度类
complexity class
需要相同数量的资源(例如时间或内存)才能解决的一组问题。
A set of problems that require the same amount of a resource (such as time or memory) to be solved.
控制结构
control structure
算法或程序中步骤的组合方式有三种:序列、选择和迭代。
The three ways in which steps can be combined in an algorithm or program: sequence, selection, and iteration.
循环
cycle
在图中,一条从同一节点开始和结束的路径。
In graphs, a path that starts and end at the same node.
悬垂节点
dangling node
在 PageRank 算法中,只有传入边而没有传出边的节点。
In the PageRank algorithm, a node with only incoming edges and no outgoing edges.
数据结构
data structure
一种组织数据的方式,以便我们可以使用一组特定的、规定的操作来处理数据。
A way to organize data, such that we can handle the data with a set of specific, prescribed operations.
决策边界
decision boundary
一个或多个变量的值,构成基于一个或多个变量的单个决策的两个不同结果之间的边界。
The values of one or more variables that form the boundary between two different outcomes of a single decision based on the variable or variables.
深度学习
deep learning
神经网络由许多隐藏层组成,这些隐藏层的排列方式使得后续层代表更深层次的概念,对应于更高的抽象层次。
Neural networks that consist of many hidden layers, arranged such that succeeding layers represent deeper concepts, corresponding to higher abstraction levels.
度(节点)
degree (node)
与节点相邻的边的数量。
The number of edges adjacent to a node.
紧密连接
densely connected
神经网络中的层排列方式使得某一层的所有神经元都与下一层的所有神经元相连。
Layers in a neural network arranged such that all the neurons of a layer are connected to all the neurons of the following layer.
衍生物
derivative
函数在某一点的斜率;或者说,函数的变化率。例如,加速度是速度的导数(速度随时间的变化率)。
The slope of a function at a point; equivalently, the rate of change of a function. For example, acceleration is the derivative of speed (the rate of change of speed in time).
Dijkstra算法
Dijkstra’s algorithm
1956 年,年轻的荷兰计算机科学家 Edsger Dijkstra 发明了一种算法,用于查找图中两个节点之间的最短路径。该算法适用于包含正权重的图。
An algorithm invented in 1956 by a young Dutch computer scientist, Edsger Dijkstra, to find the shortest path between two nodes in a graph. It works with graphs that contain positive weights.
有向图
directed graph
边有向的图。有向图也简称为有向图。
A graph in which the edges are directed. A directed graph is also called a digraph for short.
分而治之
divide and conquer
一种解决问题的方法,我们将一个问题分解成几个较小的问题(通常是两个),然后对较小的问题执行相同的操作,直到问题变得非常小,以至于可以直接找到解决方案。
A problem-solving method where we solve a problem by breaking it into smaller problems (typically two) and then do the same on the smaller problems, until the problems get so small that the solution is straightforward to find.
边缘着色
edge coloring
为图的边分配颜色,使得任何两个相邻的边都不会共享相同的颜色。
The assignment of colors to the edges of a graph so that no two adjacent edges share the same color.
特征向量
eigenvector
在线性代数中,特征向量是一个向量,当我们将它与一个特定的矩阵相乘时,结果就是这个向量乘以一个数;这个数就是它的特征值。PageRank 会找到 Google 矩阵的第一个特征向量,也就是 Google 矩阵中特征值最大的那个,该特征值等于 1。
In linear algebra, an eigenvector is a vector that, when we multiply it by a specific matrix, the result is the same vector multiplied by a number; that number is its eigenvalue. PageRank finds the first eigenvector of the Google matrix—that is, the eigenvector of the Google matrix with the largest eigenvalue, which is equal to one.
时代
epoch
在机器学习中,训练期间对整个训练数据集进行遍历。
In machine learning, a pass, during training, through the whole training data set.
欧几里得算法
Euclid’s algorithm
古希腊数学家欧几里得(约公元前300年)所著的《几何原本》共13卷,其中介绍了一种求两个整数最大公约数的算法。《几何原本》探讨几何和数论,从公理入手,并基于公理证明定理。它是现存最古老的运用演绎法的数学著作,因此也是科学史上最具影响力的著作之一。
An algorithm for finding the greatest common divisor of two integers, presented in the Elements, a set of 13 books written by the ancient Greek mathematician Euclid (ca. 300 BCE). The Elements treats geometry and number theory, starting from axioms and proving theorems based on the axioms. It is the oldest extant work of mathematics that uses this deductive approach, and as such, one of the most influential books in the history of science.
欧拉路径
Eulerian path
图中的一条路径,每条边仅被访问一次。也称为欧氏路径。
A trail through a graph such that each edge is visited exactly once. It is also called a Euleurian walk.
欧拉游
Eulerian tour
起点和终点都相同的欧拉路径。也称为欧拉路径。
A Eulerian path that starts and ends at the same node. It is also called a Eulerian tour.
欧拉数
Euler’s number
数学常数e,约等于 2.71828。当n趋近于无穷大时,它是 的极限。
The mathematical constant e, approximately equal to 2.71828. It is the limit of as n approaches infinity.
执行路径
execution path
算法在执行过程中进行的一系列步骤。
The series of steps that an algorithm carries out during its execution.
指数增长
exponential growth
一种增长模式,其中数量逐次乘以自身。例如,我们可能从a 个数量开始,然后得到个数量,然后是
,总体而言是
。数字以指数增长的方式快速增长。
A growth pattern in which a number of things is successively multiplied by itself. For example, we may start with a things, and then we’ll get things, then , and in general . Numbers grow fast with exponential growth.
阶乘
factorial
自然数n的阶乘是从 1 到n的所有数的乘积。我们使用符号n !,因此有。该定义可以扩展到所有实数,但此处不讨论此概念。
The factorial of a natural number n is the product of all numbers from 1 up to and including n. We use the symbol n! so we have . The definition can be extended to all real numbers, but that does not concern us here.
因子复杂度
factorial complexity
遵循阶乘增长的计算复杂度。用大O表示法表示为。
Computational complexity that follows factorial growth. In big O notation, it is .
火(神经元)
fire (neuron)
参见激活(神经元)。
See activation (neuron).
配件
fitting
在机器学习中,从数据中学习的过程。在这个过程中,我们构建一个符合观察结果的模型。
In machine learning, the process of learning from the data. In this process we construct a model that fits the observations.
垃圾进,垃圾出
garbage in, garbage out
如果我们给程序输入垃圾,而不是它的预期输入,那么我们不应该期待奇迹:程序将产生垃圾而不是它的预期输出。
If we feed a program garbage, instead of its expected input, we should expect no miracles: the program will produce garbage instead of its expected output.
全局最优
global optimum
问题的最佳整体解决方案。
The best overal solution to a problem.
谷歌矩阵
Google matrix
一种特殊的矩阵(超链接矩阵的修改版),用于 PageRank 算法中的幂法。
A special kind of matrix (a modification of the hyperlink matrix) that is used in the power method in the PageRank algorithm.
坡度
gradient
包含函数所有偏导数的向量。
A vector containing all the partial derivatives of a function.
图形
graph
一组节点(也称为顶点)和边(也称为链接),用于连接这些节点。图可以用来模拟任何类型的链接结构,从人到计算机网络。因此,许多问题都可以用图来建模,并且许多基于图的算法也得到了开发。
A set of nodes, also called vertices, and edges, also called links, connecting them. Graphs can be used to model any kind of linked structure, from people to computer networks. As a result, many problems can be modeled as graphs, and many algorithms have been developed that work on top of them.
图形着色
graph coloring
图的边或顶点着色。
The edge or vertex coloring of a graph.
图形处理单元 (GPU)
graphics processing unit (GPU)
专门设计用于处理计算机内部图像创建和处理指令的芯片。
A chip specially designed to handle the instructions for the creation and manipulation of images inside a computer.
最大公约数(gcd)
greatest common divisor (gcd)
给定两个整数,找出能整除这两个整数的最大整数。
Given two integers, the largest integer that divides both.
贪婪算法
greedy algorithm
一种算法,当我们必须在多个行动方案之间做出选择时,我们会选择能带来最大即时收益的方案。这并不一定能带来最终的最优结果。
An algorithm in which when we have to choose between alternative courses of action, we choose the one that gives us the greatest immediate payoff. This does not necessarily lead to the optimum outcome in the end.
硬件
hardware
构成计算机或数字设备的物理组件。该术语是对软件的补充。
The physical components that make up a computer or digital device. The term complements software.
头
head
列表中的第一个项目。
The first item in a list.
启发式
heuristic
在算法中,选择各种方案的策略。贪婪启发式算法要求我们选择目前看起来最好的选项(而不考虑未来会发生什么)。
A strategy for making choices among alternatives in an algorithm. A greedy heuristic would require us to take the option that looks best right now (never mind what could happen in the future).
隐藏层
hidden layer
不直接连接到网络输入或输出的神经网络层。
A neural network layer that is not directly connected to the input or output of the network.
Hierholzer算法
Hierholzer algorithm
一种在图上寻找欧拉回路的算法。该算法由德国数学家卡尔·希尔霍尔泽(Carl Hierholzer)于 1873 年发表。
An algorithm for finding Eulerian circuits on graphs. It was published by the German mathematician Carl Hierholzer in 1873.
爬山
hill climbing
这是一个描述问题解决的比喻。解决方案在山顶,我们必须从山脚开始攀登。每一步都可能需要在不同的路径中进行选择。根据我们的选择,我们可能会选择一条总体上最好的路径,一条并非最佳但仍然能带我们到达山顶的路径,或者一条通往高原的路径。如果发生最坏的情况,我们到达高原,就必须回到之前的位置,开始沿着另一条路径前进。
A metaphor for describing problem solving. The solution is at the top of the hill, and we have to climb from its foot. At each step there may be a decision to take among alternative paths. Depending on our choices, we may select the best path overall, a path that is not the best but still takes us to the top, or alas a path that leads to a plateau. If the worst happens and we reach a plateau, we’ll have to go back to a previous position to start moving along a different path.
超级链接
hyperlink
文本中对文本另一部分或其他文本的引用。在网络上,超链接是网页之间的链接,用户在浏览时可以点击。
A reference from a text to another part of the text or a different text. On the web, hyperlinks are links between web pages that the user may follow while browsing.
超链接矩阵
hyperlink matrix
表示图的结构的矩阵;它类似于邻接矩阵,但我们将其行的元素除以该行中非零元素的数量。
A matrix representing the structure of a graph; it is like an adjacency matrix, but we divide the elements of its row by the number of nonzero elements in the row.
超平面
hyperplane
平面在三维以上空间的概括。
The generalization of the plane in more than three dimensions.
超文本
hypertext
包含超链接的文本。
Text that contains hyperlinks.
图像识别
image recognition
识别图像中的模式的计算任务。
The computational task of recognizing patterns in images.
插入排序
insertion sort
一种排序方法,我们将每个项目取出并插入到已排序项目中的正确位置。
A sorting method where we take each item and insert it into its correct position among the already sorted items.
互联网
internet
通过一套通用通信协议互连的全球计算机和数字设备网络。最初,其首字母大写(Internet),因为互联网可以指任何超出机构内部范围的网络,这种网络被称为内部网 (Intranet)。然而,随着全球互联网的兴起,首字母大写逐渐失宠,这可能节省了大量的笔墨。
A global network of computers and digital devices, interconnected by means of a common suite of communication protocols. Initially, it was with its first letter capitalized (Internet) because internet could refer to any network that extended beyond the internal confines of an institution, which is called an intranet. As the global internet took off, however, the initial capital fell out of favor, probably saving a significant amount of ink.
棘手的问题
intractable problem
对于这个问题,我们所知的最佳算法将花费大量时间来处理除简单情况之外的所有情况。
A problem for which the best algorithms we know will take an inordinate amount of time to handle anything but trivial cases.
迭代
iteration
参见循环。
See loop.
钥匙
key
记录的一部分,用于排序或查找。当键无法分解为更小的部分(例如,身份证号码)时,键可以是原子键;当键由更小的数据块组成(例如,全名由名字、中间名和姓氏组成)时,键可以是复合键。
A part of a record that we use for sorting or finding it. A key may be atomic, when it cannot be decomposed into smaller parts (for instance, an identification number), or composite, when it consists of smaller pieces of data (like the full name comprising first name, middle name, and surname).
标签
label
机器学习中,表示观察结果所属类别的值。在训练中,计算机会输入问题及其解决方案;当问题是分类时,解决方案就是代表类别的标签。
In machine learning, a value representing the category to which an observation belongs. In training, the computer is given problems along with their solutions; when the problem is classification, the solutions are the labels representing the classes.
线性搜索
linear search
一种搜索算法,依次检查每个项,直到找到所需的项。它也被称为顺序搜索。
A search algorithm in which we examine each item in turn until we find the one we are looking for. It is also called a sequential search.
线性时间
linear time
时间与算法的输入成比例,写为。
Time proportional to the input of an algorithm, written as .
线性可分
linearly separable
一种数据集,其观测值可在二维中通过直线、三维中通过平面、或在更多维度中通过超平面分为两类。
A data set whose observations can be separated into two categories by a straight line in two dimensions, plane in three dimensions, or hyperplane in more dimensions.
列表
list
一种包含项的数据结构。每个项都指向下一个项,除了最后一个项之外,最后一个项指向任何位置,或者说指向空。因此,这些项彼此链接,这样的列表也称为链表。
A data structure that contains items. Each item points to the next one, apart from the last item, which points nowhere, or to null, as we say. The items are therefore linked to each other, and such a list is also called a linked list.
局部最优
local optimum
一个比所有其他邻近解都更好的解,但并非整体最优。邻近解是指我们只需从当前解一步移动就能得到的解。
A solution that is better than all the other neighboring solutions, but not the overall best. A neighboring solution is a solution in which we can get with a single move from the solution we are now.
对数
logarithm
对数幂的逆运算。对数就是这个问题的答案:“我应该将一个数的幂乘以多少才能得到我想要的值?” 如果我们问:“我应该将 10 的幂乘以多少才能得到 1000?”,答案是 3,因为。我们要进行幂运算的数字称为对数的底数。
如果 ,我们记为
。对于a = 2 ,我们记为lgx。
The inverse of raising to a power. The logarithm is the answer to the question, “To which power should I raise a number to get the value I want?” If we ask, “To which power should I raise 10 to get 1,000?,” the answer is 3 because . The number we will raise to the power is called the base of the logarithm. We write if . For a = 2 we write lgx.
对数时间
logarithmic time
时间与算法输入的对数成正比——例如。好的搜索算法需要对数时间。
Time proportional to the logarithm of the input of an algorithm—for example, . Good searching algorithms take logarithmic time.
对数线性时间
loglinear time
时间与算法输入的大小和输入对数的乘积成正比——例如。好的排序算法所需的时间是对数线性的。
Time proportional to the product of the size of the input and logarithm of the input of an algorithm—for example, . Good sorting algorithms take loglinear time.
环形
loop
计算机程序中重复执行的指令序列。循环在满足条件时结束。未结束的循环是无限循环,通常是一个 bug,因为它可能导致程序无法终止。参见迭代。
A sequence of instructions in a computer program that is repeated. A loop ends when a condition is fulfilled. A loop that does not end is an infinite loop and is usually a bug because it may lead to a program that fails to terminate. See iteration.
损失
loss
机器学习算法的实际输出与期望输出之间的差异。通常由损失函数计算。
The difference between the actual and desired output of a machine learning algorithm. It is typically calculated by a loss function.
机器学习
machine learning
使用通过从示例中自动学习来解决问题的算法。
The use of algorithms that solve problems by learning automatically from examples.
矩阵
matrix
矩阵是一种矩形阵列,通常由数字或更常见的数学表达式组成。矩阵的内容水平排列成行,垂直排列成列。
A rectangular array, typically of numbers or more generally mathematical expressions. The contents of a matrix are arranged horizontally in rows and vertically in columns.
马太效应
Matthew effect
富人越来越富,穷人越来越穷的现象。该词源于《马太福音》(25:29),指富人越来越富,穷人越来越穷的现象。它适用于许多领域,而不仅仅是物质财富。
The phenomenon of the rich getting richer and poorer getting poorer. Named after the Gospel of Matthew (25:29), it has been found to apply to many contexts, not just material wealth.
最小化问题
minimization problem
在可能的解决方案中,我们尝试找到具有最小值的解决方案的问题。
A problem in which, among the possible solutions, we try to find the one with the minimum value.
合并排序
merge sort
一种通过反复合并越来越大的排序项集来进行的排序方法。
A sorting method that works by repeatedly merging larger and larger sets of sorted items.
摩尔定律
Moore’s law
1965年,仙童半导体公司和英特尔创始人戈登·摩尔观察到,集成电路中晶体管的数量大约每两年翻一番。这是指数增长的一个例子。
The observation, made in 1965 by Gordon Moore, founder of Fairchild Semiconductor and Intel, that the number of transistors in an integrated circuit doubles about every two years. It is an example of exponential growth.
移至前面
move to front
自组织搜索算法。当我们找到我们要找的物品时,我们会将其移到第一位。
A self-organizing search algorithm. When we find the item we are looking for, we move it to the first position.
多重图
multigraph
一条边可以出现多次的图。
A graph in which an edge can occur more than once.
多集
multiset
一个元素可以出现多次的集合;在数学中,在普通集合中,一个元素不能出现超过一次。
A set in which an element can appear multiple times; in mathematics, in a normal set an element cannot appear more than once.
节点
node
各种数据结构中的一项。列表中的项称为节点。
An item in various data structures. Items in lists are called nodes.
神经元
neuron
神经元是构成神经系统基本组成部分的细胞。它接收来自其他神经元的信号,并将其传递给神经系统中的其他神经元。
A neuron is a cell that forms the basic building block of the nervous system. It receives signals from other neurons and propagates them to other neurons in the nervous system.
无效的
null
计算机中一片虚无。
Nothingness in a computer.
在线算法
online algorithm
一种无需问题全部输入即可生成解决方案的算法。在线算法会逐步获取输入,并在每次获取输入时,根据迄今为止收到的输入生成解决方案。
An algorithm that does not require the full input to a problem in order to produce a solution. An online algorithm gets the input incrementally, as this arrives, and at each point produces a solution that takes account of the input it has received so far.
发病
onset
节奏的重音部分。
The accented part of a rhythm.
最优停止问题
optimal stopping problem
当您试图最大化奖励或最小化惩罚时,问题在于知道停止的最佳时间。
The problem of knowing the best time to stop when you are trying to maximize a reward or minimize a penalty.
优化器
optimizers
优化函数值的算法。在机器学习中,优化器通常会最小化损失函数的值。
Algorithms that optimize the value of a function. In machine learning, optimizers typically minimize the value of the loss function.
过度拟合
overfitting
相当于机器学习中的死记硬背。我们试图训练的模型过于严格地遵循训练数据,以至于拟合度过高。结果,它无法预测其他未知数据的正确值。
The equivalent of learning by rote in machine learning. The model that we are trying to train follows the training data so closely that it fits them too well. As a result, it does not predict correct values for other, unknown data.
溢出
overflow
超出计算机允许值的范围。
Going beyond the range of allowable values on a computer.
PageRank
PageRank
一种根据网页重要性对其进行排名的算法。它由谷歌创始人开发,是谷歌搜索引擎的基础。网页的排名即其 PageRank。
An algorithm used to rank web pages in terms of their importance. It was developed by the founders of Google and was the foundation of the Google search engine. The rank of a web page is its pagerank.
网页排名向量
pagerank vector
包含图表的 PageRank 的向量。
A vector containing the pageranks of a graph.
偏导数
partial derivative
在多变量函数中,当其他变量保持不变时,该函数对一个变量求导。
In a function of many variables, the derivative of the function with respect to one variable, holding all other variables constant.
小路
path
在图中,连接一系列节点的边序列。
In a graph, a sequence of edges that connect a sequence of nodes.
路径长度
path length
图中路径上权重的总和。如果图没有权重,则权重为构成路径的链接数。
The sum of the weights along a path in a graph. If a graph does not have weights, it is the number of the links constituting the path.
感知器
Perceptron
使用阶跃函数进行激活的人工神经元。
An artificial neuron that uses the step function for its activation.
排列
permutation
以不同的顺序重新排列一些数据。
A rearrangement of some data in a different order.
指针
pointer
计算机内存中的一个位置,用于保存计算机内存中另一个位置的地址。这样,前者指向后者。
A place in computer memory that holds the address of another place in computer memory. In this way, the former points to the latter.
多项式时间
polynomial time
与算法输入的常数幂成比例的时间,例如。
Time proportional to the input to an algorithm raised to a constant power, such as .
幂法
power method
一种算法,从一个向量开始,将其与一个矩阵相乘,然后反复将结果与矩阵相乘,直到收敛为一个稳定的值。幂法是 PageRank 的核心;它收敛的向量是 Google 矩阵的第一个特征向量。
An algorithm that starts with a vector, multiplies it by a matrix, and then repeatedly multiplies the result by the matrix until it converges into a stable value. The power method is at the heart of PageRank; the vector at which it converges is the first eigenvector of the Google matrix.
程序
program
用编程语言编写的一组描述计算过程的指令。
A set of instructions, written in a programming language, that describes a computational process.
编程
programming
编写计算机程序的艺术。
The art of writing computer programs.
编程语言
programming language
一种用于描述计算步骤的人工语言。编程语言可以在计算机上执行。与人类语言类似,编程语言也具有语法和文法,规定了可以用它编写的内容。编程语言种类繁多,并且为了提高编程效率(或者因为许多人无法抗拒创建自己的语言并希望其被广泛采用),新的编程语言层出不穷。编程语言可以是高级的,因为它看起来与人类语言有些相似;也可以是低级的,因为它的结构比较简陋,反映了底层硬件的特性。
An artificial language that can be used to describe computational steps. A programming language can be executed on a computer. Like a human language, a programming language has syntax and grammar, specifying what can be written in it. Several programming languages exist, and new programming languages are developed all the time in an effort to make programming more productive (or because many people cannot resist creating their own language and hope it will be widely adopted). A programming language can be high level, when it looks somewhat akin to a human language, or low level, when its constructs are rudimentary, mirroring the underlying hardware.
穿孔卡片
punched card
一种硬纸,通过其上打孔的位置记录信息。它也被称为穿孔卡。穿孔卡用于早期计算机,以及更早的提花织机等机器,用于描述待编织的图案。
A piece of stiff paper that records information by the location of the punched holes on it. It is also called a punch card. The cards were used in early computers, and before that, in machines such as Jacquard looms, in which they described the pattern to be woven.
量子计算机
quantum computer
一种利用量子现象进行计算的计算机。量子计算机以量子比特而非比特为基础进行计算。某些问题在量子计算机上的求解速度比传统计算机快得多。量子计算机的制造面临着严峻的物理挑战。
A computer that leverages quantum phenomena to perform computations. Quantum computers work with qubits instead of bits. Some problems can be solved much faster on quantum computers than on classical ones. The manufacture of quantum computers presents difficult physical challenges.
量子比特
qubit
量子信息的基本单位。量子比特可以存在于两个状态(0 和 1)的叠加态中,直到我们测量它时,它会坍缩为两个二进制值之一。量子比特可以利用量子特性(例如电子自旋)来实现。
The basic unit of quantum information. A qubit can exist in a superposition of two states, 0 and 1, until we measure it, when it collapses to one of the two binary values. A qubit can be implemented using quantum properties, such as the spin of an electron.
快速排序
quicksort
一种排序方法,通过反复选择一个项目并移动其周围的其他项目,使得所有较小的项目都在一侧,其余项目都在另一侧。
A sorting method that works by repeatedly selecting an item and moving the other items around it so that all smaller items are on the one side and all the rest on its other side.
基数排序
radix sort
一种排序方法,其工作原理是将键分解成各个部分(例如,数字键的数值部分),并将这些元素按其各部分的值(每部分一个数字)堆成堆。我们首先根据最后一位数字组成堆,然后将所有堆堆叠起来,并根据倒数第二位数字重新分配到堆中,依此类推。当我们对第一位数字执行此过程时,最终会得到一个已排序的堆。这是一种字符串排序方法,因为我们将数字键视为一串数字。
A sorting method that works by breaking the keys into their parts (for example, digits for numerical keys) and placing the items into piles corresponding to the values of their parts (ten piles, one for each digit). We start by forming piles based on the last digit, then we stack all piles and redistribute to piles based on the one but last digit, and so on. When we do the procedure for the first digit, we end up with a sorted pile. It is a string sorting method because we treat numerical keys as a string of digits.
随机冲浪者
random surfer
一个人通过逐页浏览网页,并根据 Google 矩阵给出的概率选择下一个页面。
A person who surfs the web by going from page to page, choosing the next page according to the probability given by the Google matrix.
随机化
randomization
算法中随机性的运用。通过这种方式,即使在计算上无法找到最优解,算法也能在大多数情况下找到问题的良好解决方案。
The use of randomness in algorithms. In this way, an algorithm may be able to find good solutions to a problem in most cases, even if it would be computationally infeasible to find the optimal solution.
记录
record
描述特定应用程序实体的一组相关数据。例如,学生记录可以包含身份数据、入学年份和成绩单。
A set of related data describing an entity for a particular application. For example, a student record can include identification data, enrollment year, and transcripts.
整流器
rectifier
激活函数将所有负输入变为零,否则其输出与输入成正比。
An activation function that turns all negative inputs to zero, or otherwise its output is directly proportional to its input.
松弛
relaxation
图算法中的一种方法,我们将最坏的可能值赋给想要求的值,然后算法会对这些值进行越来越好的估计。因此,我们从最极端的值开始,逐渐放宽这些值,使其越来越接近最终结果。
A method in graph algorithms, where we assign the worst possible value to the values we want to find, and the algorithm proceeds by producing better and better estimates for these values. We therefore start with the most extreme values possible, and gradually relax them with values that are closer and closer to the final result.
线性整流单元
ReLU
使用整流器作为激活函数的神经元。ReLU 代表整流线性单元。
A neuron that uses a rectifier as its activation function. ReLU stands for rectified linear unit.
搜索空间
search space
我们搜索的值域。
The domain of values in which we search.
秘书问题
secretary problem
一个最优停止问题。我们从候选人池中依次审查每一位候选人。我们必须当场做出是否录用的决定,不能推翻之前的决策,也不能事先审查剩余的候选人。
An optimal stopping problem. From a pool of candidates, we examine each one in turn. We must make the decision to hire or not on the spot, without being able to reverse past decisions, and without having examined the remaining candidates.
选择
selection
在算法和编程中,根据某些逻辑条件在要执行的一系列备选步骤之间做出选择。
In algorithms and programming, a choice, based on some logical condition, between alternative series of steps to be executed.
选择排序
selection sort
一种排序方法,每次我们找到未排序项中的最小值并将其放入正确的位置。
A sorting method where each time we find the minimum of the unsorted items and put it into its correct position.
自组织搜索
self-organizing search
搜索算法利用搜索项的受欢迎程度,将其移动到我们能够更快找到的位置。
Search algorithms that take advantage of the popularity of search items by moving them to positions where we’ll be able to find them faster.
顺序
sequence
在算法和编程中,一系列相继执行的步骤。
In algorithms and programming, a series of steps executed one after the other.
最短路径
shortest path
图中两个节点之间的最短路径。
The shortest path between two nodes in graph.
乙状结肠
sigmoid
值范围从 0 到 1 的 S 形函数。
An S-shaped function whose values range from 0 to 1.
社交网络
social network
图中节点代表人,边代表他们之间的关系。
A graph in which nodes are people, and the edges are the relationships between them.
softmax
softmax
激活函数以实数向量作为输入,并将其转换为另一个概率分布向量。
An activation function that takes as input a vector of real numbers and turns it into another vector that is a probability distribution.
软件
software
指在计算机或数字设备上运行的一组程序;该术语是对硬件的补充。这些术语在计算机出现之前就已经在不同的环境中使用。1850 年,垃圾场的拾荒者使用“软件”和“硬件”这两个术语来区分会分解的物质和其他物质。这些含义或许能给那些因计算机无法正常工作而苦恼的人带来慰藉。
The set of programs running on a computer or digital device; the term complements hardware. The terms have been used before computers in a different setting. In 1850, rubbish-tip pickers were using the terms “soft-ware” and “hard-ware” to distinguish between material that would decompose and everything else. These meanings may bring solace to anybody struggling with a computer that won’t do what it is supposed to do.
散裂
spallation
将物质破碎成更小的碎片。在核物理学中,这种物质指的是重原子核,它在被高能粒子轰击后会释放出大量质子和中子。
Breaking a material into smaller pieces. In nuclear physics, the material is a heavy nucleus that emits a large number of protons and neutrons after being bombarded with a high-energy particle.
稀疏矩阵
sparse matrix
大多数元素等于零的矩阵。
A matrix in which most elements are equal to zero.
细绳
string
符号序列。传统上,字符串是字符序列,但如今,字符串中可以包含的内容取决于实际应用;它可能是数字、字母、标点符号,甚至是最近发明的符号,例如表情符号。
A sequence of symbols. Traditionally a string was a sequence of characters, but nowadays what can go into a string depends on the actual application; it may be digits, alphabetic characters, punctuation, or even more recently invented symbols such as emojis.
字符串排序方法
string sorting method
一种将键视为符号序列的排序方法。例如,键 1234 被视为由符号 1, 2, 3, 4 组成的字符串,而不是数字 1,234。
A sorting method that treats its keys as a sequence of symbols. For example, the key 1234 is treated as the string of symbols 1, 2, 3, 4 instead of the number 1,234.
监督学习
supervised learning
一种机器学习方法,其中我们提供一种算法,该算法具有输入问题及其解决方案。
A machine learning approach in which we provide an algorithm with input problems accompanied by their solutions.
突触
synapse
神经元之间的连接。
A connection between neurons.
制表机
tabulating machine
可以读取穿孔卡片并使用其上的信息进行计数的机电设备。
Electromechanical devices that could read punched cards and use the information on them to produce a tally.
tanh(双曲正切)
tanh (hyperbolic tangent)
一个类似于 S 型函数的激活函数,但其输出范围从到 1。
An activation function that looks like the sigmoid function, but its output ranges from to 1.
测试数据集
test data set
我们在训练期间留出的数据,以便我们可以用它们来检查特定的机器学习方法在处理真实数据时的表现如何。
Data that we set aside during training so that we can use them to check how well a particular machine learning approach will perform with real-world data.
旅游
tour
在图中,起始和终止于同一节点的路径。也称为回路。
A path that starts and ends at the same node in a graph. It is also called a circuit.
训练
training
在机器学习中,为算法提供示例输入以便其可以学习产生正确输出的过程。
In machine learning, the process of providing an algorithm with example inputs so that it can learn to produce correct outputs.
训练数据集
training data set
我们利用机器学习算法的数据来训练它们解决问题。
Data that we use with machine learning algorithms to train them to solve problems.
转置法
transposition method
一种自组织搜索算法。当我们找到一个元素时,我们会将其与前一个元素交换。这样,热门元素就会移到前面。
A self-organizing search algorithm. When we find an element, we swap it with the one preceding it. In this way, popular items are moving to the front.
旅行商问题
traveling salesman problem
又称旅行商问题,但人们并没有对性别定义进行过多思考。这个问题的本质是:如果我们有一张城市列表,并标明每对城市之间的距离,那么访问每座城市一次并返回出发城市的最短路径是什么?这可能是最著名的难题。
Also known as the traveling salesperson problem, but people did not put much thought into gender definitions. The problem that asks us, If we have a list of cities and the distances between each pair of them, what is the shortest possible route that one should take to visit each city once and return to the origin city? It is probably the most famous intractable problem.
图灵机
Turing machine
一种由艾伦·图灵描述的理想化(抽象)机器,由一条无限长的磁带和一个可移动的读写头组成,读写头按照一组既定规则在磁带上读写符号。图灵机可以实现任何算法,因此可以用作可计算模型。
An idealized (abstact) machine, described by Alan Turing, consisting of an infinite tape and movable head that reads and writes symbols on the tape following a set of prescribed rules. The Turing machine can implement any algorithm and therefore can be used as a model of what can be computed.
一元数制
unary numeral system
使用单个符号来表示数字的数字系统;例如,一个笔画代表一个单位,因此 III 代表三。
The number system using a single symbol for representing numbers; for instance, a stroke representing a unit, so that III represents three.
无向图
undirected graph
边无向的图。
A graph in which the edges are undirected.
无监督学习
unsupervised learning
一种机器学习方法,我们向算法提供输入问题,但不包括其解决方案。然后,机器学习算法必须推导出预期输入,才能生成它。
A machine learning approach in which we provide an algorithm input problems without their solutions. The machine learning algorithm then must derive what the expected input should be in order to be able to produce it.
向量
vector
水平行或垂直列的数字(或更广义的数学表达式)。我们通常在几何学中遇到向量,它是一个具有长度和方向的几何实体,用包含其数值坐标的行或列来表示;然而,向量的概念比这更通用——例如,PageRank向量。向量是矩阵的一个特例。
A horizontal row or vertical column of numbers (or more generally, mathematical expressions). Usually we meet vectors in geometry, where it is a geometric entity with a length and direction, represented as a row or column containing their numerical coordinates; however, the notion of a vector is more general than that—take, for example, the pagerank vector. A vector is a special case of a matrix.
顶点着色
vertex coloring
为图的顶点分配颜色,使得没有两个相邻的顶点共享相同的颜色。
The assignment of colors to the vertices of a graph so that no two adjacent vertices share the same color.
体重(图表)
weight (graph)
附加到图的边上的数字。例如,该数字可以模拟与该边连接的节点之间的链接相关的奖励或惩罚。
A number attached to an edge of a graph. Such a number may, for example, model a reward or penalty associated with the link between the nodes connected by the edge.
权重(神经元)
weight (neuron)
附加到神经元突触上的数值。每个突触都会给神经元接收一个乘以该突触权重的输入。
A numerical value attached to a synapse in a neuron. From each synapse, the neuron receives an input multiplied by the weight of the synapse.
加权输入(神经元)
weighted input (neuron)
输入与神经元权重的乘积之和。
The sum of the products of the inputs with the weights of a neuron.
1. “算法时代”于2018年2月8日在Radio Open Source播出。
1. “The Algorithmic Age” was aired on February 8, 2018, on Radio Open Source.
3. Eric Bjorklund (1999) 提出了在同步神经系统 (SNS) 中分配多个脉冲的算法。Godfried Toussaint (2005) 注意到了其与节奏的相似性,他的研究成果是我们阐述的基础。更深入的讨论,请参阅 Demaine 等人 (2009) 的著作。有关算法与音乐的论述,请参阅 Toussaint (2013)。
3. The algorithm for distributing a number of pulses in timing slots in the SNS was given by Eric Bjorklund (1999). Godfried Toussaint (2005) noticed the parallel with rhythms, and his work is the basis for our exposition. For a more extensive discussion, see Demaine et al. 2009. For a book-length treatment of algorithms and music, see Toussaint 2013.
4.该标准来自 Donald Knuth (1997, sec. 1),他也是从欧几里得算法开始阐述的。
4. The criteria come from Donald Knuth (1997, sec. 1), who also starts his exposition with Euclid’s algorithm.
5.有关网格路径枚举的讨论,请参阅 Knuth 2011,第 253–255 页;它是示例和路径图像的来源。有关给出可能路径数量的算法,请参阅 Iwashita 等人 2013 年的论文。
5. For a discussion of the enumeration of the paths on the grid, see Knuth 2011, 253–255; it is the source for the example and path images. For the algorithm that gives the number of possible paths, see Iwashita et al. 2013.
6.有关这些数字的描述,请参阅Tyson, Strauss, and Gott 2016, 18–20。在戴夫·艾格斯的小说《圆圈》中,一家几乎不加掩饰的科技公司计算了撒哈拉沙漠中沙粒的数量。
6. For these number descriptions, see Tyson, Strauss, and Gott 2016, 18–20. In Dave Eggers’s novel The Circle, a thinly disguised technology company calculates the number of grains of sand in the Sahara Desert.
7.要将纸折叠n次,纸张必须足够大。如果总是沿同一尺寸折叠,则需要一张长纸。长度由公式给出,其中t是纸张的厚度,n是折叠次数。如果交替方向折叠一张正方形纸,则正方形的宽度必须为
。这些公式比简单的 2 的幂更复杂的原因是,每次折叠纸张时,由于纸张沿折叠边缘弯曲,都会损失一部分;正是通过计算这些曲线,π才出现在这些公式中。这些公式由当时还是高中三年级学生的 Britney Crystal Gallivan 于 2002 年发现。她继续演示,一张 1,200 米长的卫生纸可以对折 12 次。有关幂的力量(包括此示例)的精彩介绍,请参阅 Strogatz 2012,第 11 章。
7. To fold paper n times, the paper must be large enough. If you fold it always along the same dimension, you will need a long sheet of paper. The length is given by the formula , where t is the paper’s thickness and n is the number of folds. If you fold a square sheet of paper in alternate directions, then the width of the square must be . The reason why the formulas are more complicated than simple powers of two is that every time you fold the paper, you lose some part of it as it curves along the edge of the fold; it’s from calculating these curves that π enters the picture in these formulas. The formulas were found in 2002 by Britney Crystal Gallivan, then a junior in high school. She went on to demonstrate that a 1,200 meters–long sheet of toilet paper could be folded in half 12 times. For a nice introduction to the power of powers (including this example), see Strogatz 2012, chapter 11.
8. “晶体管数量”,维基百科,https://en.wikipedia.org/wiki/Transistor_count。
8. “Transistor Count,” Wikipedia, https://en.wikipedia.org/wiki/Transistor_count.
9。这是因为,要比较n 个项目,您需要取其中一个并将其与所有其他项目进行比较,然后取另一个并将其与其他
项目进行比较(您已经将其与使用的第一个项目进行了比较),依此类推。这样就得到了
。然后您会得到
,因为根据大 O 的定义,如果您的算法在时间 内运行
,那么它肯定会在时间 内运行
。
9. That is because to compare n items between them, you need to take one of them and compare it to all the other items, then you take another one and compare it to the other items (you have already compared it to the first item you used), and so on. That gives comparisons. Then you get , because according to the definition of big O, if your algorithm runs in time , it will certainly run in time .
1.图片取自维基百科共享资源https://commons.wikimedia.org/wiki/File:Konigsberg_Bridge.png。该图像属于公共领域。
1. Image retrieved from the Wikipedia Commons at https://commons.wikimedia.org/wiki/File:Konigsberg_Bridge.png. The image is in the public domain.
2.这篇论文(Eulerho 1736)可在美国数学协会维护的欧拉档案馆(http://eulerarchive.maa.org)获取。英文译本请参见 Biggs, Lloyd 和 Wilson 1986 年著。
2. The paper (Eulerho 1736) is available from the Euler Archive (http://eulerarchive.maa.org), maintained by the Mathematical Association of America. For an English translation, see Biggs, Lloyd, and Wilson 1986.
3.关于图表的文献浩如烟海,图表本身也同样如此。想要了解一些入门知识,可以参阅 Benjamin、Chartrand 和 Zhang 2015 的著作。
3. The literature on graphs is vast, as is the subject itself. For a good starting point, see Benjamin, Chartrand, and Zhang 2015.
4.图片来自原始出版物(Eulerho 1736),取自维基百科共享资源https://commons.wikimedia.org/wiki/File:Solutio_problematis_ad_geometriam_situs_pertinentis,_Fig._1.png。该图像属于公共领域。
4. Image from the original publication (Eulerho 1736) retrieved from the Wikipedia Commons at https://commons.wikimedia.org/wiki/File:Solutio_problematis_ad_geometriam_situs_pertinentis,_Fig._1.png. The image is in the public domain.
5.图片取自凯库勒(Kekulé)1872 年的作品,摘自维基百科https://en.wikipedia.org/wiki/Benzene#/media/File:Historic_Benzene_Formulae_Kekul%C3%A9_(original).png。该图片属于公共领域。
5. Image from Kekulé 1872, retrieved from the Wikipedia at https://en.wikipedia.org/wiki/Benzene#/media/File:Historic_Benzene_Formulae_Kekul%C3%A9_(original).png. The image is in the public domain.
7.有关 Hierholzer 算法和其他欧拉路径算法的更多详细信息,请参阅 Fleischner 1991。有关图在基因组组装中的应用,请参阅 Pevzner、Tang 和 Waterman 2001;Compeau、Pevzner 和 Tesler 2011。
7. For more details on Hierholzer’s algorithm and other algorithms for Eulerian paths, see Fleischner 1991. For the use of graphs in genome assembly, see Pevzner, Tang, and Waterman 2001; Compeau, Pevzner, and Tesler 2011.
8.有关在线边着色贪婪算法的最优性的分析,以及显示最坏情况的星形图示例,请参阅 Bar-Noy、Motwani 和 Naor 1992。
8. For an analysis of the optimality of the greedy algorithm for online edge coloring, as well as the example of the starlike graph to show the worst case, see Bar-Noy, Motwani, and Naor 1992.
1.有关马太效应的首次描述,请参阅 Merton 1968。有关一系列体现不平等分配现象的概述,请参阅 Barabási 和 Márton 2016;West 2017。有关体育场高度和贫富差距,请参阅 Taleb 2007。
1. For the first description of the Matthew effect, see Merton 1968. For overviews of the range of phenomena manifesting unequal distributions, see Barabási and Márton 2016; West 2017. For the stadium height and wealth disparity, see Taleb 2007.
2. John McCabe(1965)提出了一种自组织搜索方法。有关移至前端和转置方法性能的分析,请参阅 Rivest 1976;Bachrach、El-Yaniv 和 Reinstädtler 2002。
2. John McCabe (1965) presented a self-organized search. For analyses of the performance of the move-to-front and transposition methods, see Rivest 1976; Bachrach, El-Yaniv, and Reinstädtler 2002.
3.秘书问题出现在马丁·加德纳1960年2月发表于《科学美国人》的专栏中。该问题的解决方案出现在1960年3月刊中。有关其历史,请参阅弗格森(1989年)。J.尼尔·比尔登(J. Neil Bearden)(2006年)提供了非全有或全无变体的解决方案。马特·帕克(Matt Parker)(2014年,第11章)介绍了这个问题,以及其他一些数学思想和计算机入门知识。
3. The secretary problem appeared in Martin Gardner’s column in February 1960 in Scientific American. A solution was given in the March 1960 issue. For its history, see Ferguson 1989. J. Neil Bearden (2006) provided the solution for the not all-or-nothing variant. Matt Parker (2014, chapter 11) presents the problem, along with several other mathematical ideas and an introduction to computers.
4.二分查找可以追溯到计算机时代的黎明(Knuth 1998)。第一台通用电子数字计算机 ENIAC 的设计者之一约翰·莫奇利 (John Mauchly) 于 1946 年对其进行了描述。有关二分查找的曲折历史,请参阅 Bentley 2000;Pattis 1988;Bloch 2006。
4. Binary search goes back to the dawn of the computer age (Knuth 1998). John Mauchly, one of the designers of the ENIAC, the first general-purpose electronic digital computer, described it in 1946. For the checkered history of binary search, see Bentley 2000; Pattis 1988; Bloch 2006.
2.自计算机诞生以来,选择排序和插入排序就一直伴随着我们;它们被包含在 20 世纪 50 年代发表的一项排序调查中(Friend 1956)。
2. Selection and insertion sort have been with us since the dawn of computers; they were included in a survey of sorting published in the 1950s (Friend 1956).
3.根据 Knuth (1998, 170) 的说法,我们在此处看到的基数排序背后的想法似乎至少从 20 世纪 20 年代就已经出现。
3. According to Knuth (1998, 170), the idea behind radix sort that we have seen here seems to have been around at least since the 1920s.
4.抛硬币226次可得出。从地球中取出一个原子的例子来自David Hand (2014),他认为小于1的概率
在宇宙尺度上可以忽略不计。
4. Flipping the coin 226 times follows from . The example of picking an atom from the earth is from David Hand (2014), according to whom probabilities less than one in are negligible on the cosmic scale.
1.最初的 PageRank 算法由 Brin 和 Page (1998) 发表。我们略过了该算法所使用的数学原理。更深入的阐述,请参阅 Bryan 和 Leise (2006)。关于搜索引擎的介绍以及 PageRank,参见 Langville 和 Meyer 2006;Berry 和 Browne 2005。除了 PageRank,另一个重要的排名算法是超文本诱导主题搜索(Hypertext Induced Topic Search,简称 HITS)(Kleinberg 1998, 1999),该算法在 PageRank 之前就已开发。类似的思想在其他领域(社会测量学,即社会关系的定量研究;计量经济学,即经济学原理的定量研究)早已发展起来,可以追溯到 20 世纪 40 年代(Franceschet 2011)。
1. The original PageRank algorithm was published by Brin and Page (1998). We glossed over the mathematics used by the algorithm. For a more in-depth treatment, see Bryan and Leise 2006. For an introduction to search engines and PageRank, see Langville and Meyer 2006; Berry and Browne 2005. Apart from PageRank, another important algorithm used for ranking is Hypertext Induced Topic Search, or HITS (Kleinberg 1998, 1999), developed before PageRank. Similar ideas had been developed in other fields (sociometry, the quantitative study of social relationships, and econometrics, the quantitative study of economic principles) much earlier, going back to the 1940s (Franceschet 2011).
1.尽管如今我们能够利用科技手段更清晰地观察神经元,但拉蒙·卡哈尔 (Ramón y Cajal) 仍是一位先驱,他的绘画堪称科学史上最精美的插图之一。网络上有很多关于神经元的图片,但这幅图对我们来说已经足够了,只需简单搜索一下,就能感受到拉蒙·卡哈尔插图的优美和持久力量。该图片属于公共领域,取自https://commons.wikimedia.org/wiki/File:PurkinjeCell.jpg。
1. Although today we can use technology to see neurons in much greater detail, Ramón y Cajal was a pioneer, and his drawings rank among the most elegant illustrations in the history of science. You can find neuron images aplenty on the web, but this image is enough for us, and a simple web search should convince you of the beauty and enduring power of Ramón y Cajal’s illustrations. The image is in the public domain, retrieved from https://commons.wikimedia.org/wiki/File:PurkinjeCell.jpg.
2.准确的说,sigmoid 指的是希腊字母 sigma,即Σ,但它的外观更接近拉丁文 S。
2. To be accurate, sigmoid would refer to the Greek letter sigma, which is Σ, yet its appearance is closer to the Latin S.
3.角的正切定义为直角三角形中对边与邻边的比值,或者换句话说,等于单位圆中该角的正弦除以该角的余弦。双曲正切定义为双曲线上某个角的双曲正弦与双曲余弦的比值。
3. The tangent of an angle is defined as the ratio of the opposite side to the adjacent side in a straight triangle, or equivalently, by the sine of the angle divided by the cosine of the angle in the unit circle. The hyperbolic tangent is defined as the ratio of the hyperbolic sine by the hyperbolic cosine of an angle on a hyperbola.
4. Warren McCulloch 和 Walter Pitts (1943) 提出了第一个人工神经元。Frank Rosenblatt (1957) 描述了感知器。如果它们已有半个多世纪的历史,那神经网络为何最近才变得如此流行?Marvin Minsky 和 Seymour Papert (1969) 在他们著名的同名著作中对感知器进行了沉重的批判,指出单个感知器存在根本的计算限制。这一点,再加上当时硬件的限制,导致了神经计算的寒冬,一直持续到 20 世纪 80 年代,研究人员找到了构建和训练复杂神经网络的方法。随后,人们对该领域的兴趣又重新燃起,但要将神经网络推进到我们在过去几年中看到的引人注目的结果,仍然需要做大量的工作。
4. Warren McCulloch and Walter Pitts (1943) proposed the first artificial neuron. Frank Rosenblatt (1957) described the Perceptron. If they are more than half a century old, how come neural networks have become all the rage recently? Marvin Minsky and Seymour Papert (1969) struck a major blow to Perceptrons in their famous book of the same name, which showed that a single Perceptron had fundamental computing limitations. This, coupled with the hardware limitations of the time, ushered in a so-called winter in neural computation, which lasted well until the 1980s, when researchers found how to build and train complex neural networks. Interest in the field then revived, but still a lot more work was required to advance neural networks to the media-grabbing results that we have been seeing in the last few years.
5.神经网络的挑战之一是其符号可能令人反感,因此这些内容似乎只有入门者才能理解。事实上,一旦你了解了它的含义,它其实并不复杂。你经常参见导数;函数关于x的导数写为。多元
函数f
的偏导数,例如, ,
,……,
,写为
。梯度写为
。
5. One of the challenges in neural networks is that the notation can be off-putting and hence the material seems approachable only to the initiated. In fact, it is not that complicated once you know what it is about. You often see derivatives; the derivative of a function with respect to x is written . The partial derivative of a function f of many variables, say, , , . . . , , is written . The gradient is written .
6.反向传播算法于 20 世纪 80 年代中期出现(Rumelhart、Hinton 和 Williams 1986),尽管早在 20 世纪 60 年代就已经出现了该算法的各种衍生算法。
6. The backpropagation algorithm came onto the scene in the mid-1980s (Rumelhart, Hinton, and Williams 1986), although various derivations of it had appeared back in the 1960s.
7.此图像来自 Fashion-MNIST 数据(Xiao、Rasul 和 Vollgraf 于 2017 年发表),该数据集是作为机器学习的基准数据集开发的。本节内容灵感来自 TensorFlow 基础分类教程(网址:https://www.tensorflow.org/tutorials/keras/basic_classification ) 。
7. This image is from the Fashion-MNIST data (Xiao, Rasul, and Vollgraf 2017), which was developed as a benchmark data set for machine learning. This section was inspired by the basic classification TensorFlow tutorial at https://www.tensorflow.org/tutorials/keras/basic_classification.
8.有关第一个击败围棋人类冠军的系统的描述,请参阅 Silver 等人 2016 年的文章。有关不需要人类以先前玩过的游戏形式获取知识的改进系统,请参阅 Silver 等人 2017 年的文章。
8. For a description of the first system to beat the Go human champion, see Silver et al. 2016. For an improved system that does not require human knowledge in the form of previously played games, see Silver et al. 2017.
9.深度学习的文献浩如烟海。有关该主题的全面介绍,请参阅 Goodfellow、Bengio 和 Courville 2016 年的著作。有关更简短易懂的论述,请参阅 Charniak 2018 年的著作。有关简明概述,请参阅 LeCun、Bengio 和 Hinton 2015 年的著作。有关深度学习和机器学习,请参阅 Alpaydin 2016 年的著作。有关自动神经架构搜索方法的综述,请参阅 Elsken、Hendrik Metzen 和 Hutter 2018 年的著作。
9. The literature on deep learning is vast. For a comprehensive introduction to the topic, see Goodfellow, Bengio, and Courville 2016. For a shorter and more approachable treatment, see Charniak 2018. For a concise overview, see LeCun, Bengio, and Hinton 2015. For deep and machine learning, see Alpaydin 2016. For a survey of automated neural architecture search methods, see Elsken, Hendrik Metzen, and Hutter 2018.
1.除了图灵之外,入围名单上的其他名字还有玛丽·安宁、保罗·狄拉克、罗莎琳·富兰克林、威廉·赫歇尔和卡罗琳·赫歇尔、多萝西·霍奇金、阿达·洛夫莱斯和查尔斯·巴贝奇、斯蒂芬·霍金、詹姆斯·克拉克·麦克斯韦、斯里尼瓦瑟·拉马努金、欧内斯特·卢瑟福和弗雷德里克·桑格。巴贝奇、洛夫莱斯和图灵都是计算机先驱。巴贝奇 (1791–1871) 发明了第一台机械计算机,并发展了现代计算机的基本思想。拜伦勋爵的女儿洛夫莱斯 (1815–1852) 曾与巴贝奇一起工作,认识到了他发明的潜力,并第一个开发出可在这种机器上运行的算法。她现在被认为是第一位计算机程序员。有关 50 英镑的设计,请参阅官方公告https://www.bankofengland.co.uk/news/2019/july/50-pound-banknote-character-announcement。
1. Besides Turing, other names on the short list were Mary Anning, Paul Dirac, Rosalind Franklin, William Herschel and Caroline Herschel, Dorothy Hodgkin, Ada Lovelace and Charles Babbage, Stephen Hawking, James Clerk Maxwell, Srinivasa Ramanujan, Ernest Rutherford, and Frederick Sanger. Babbage, Lovelace, and Turing were all computer pioneers. Babbage (1791–1871) invented the first mechanical computer and developed the essential ideas of modern computers. Lovelace (1815–1852), the daughter of Lord Byron, worked with Babbage, recognized the potential of his invention, and was the first to develop an algorithm that would run on such a machine. She is now considered to have been the first computer programmer. For the £50 design, see the official announcement at https://www.bankofengland.co.uk/news/2019/july/50-pound-banknote-character-announcement.
2.请参阅安德鲁·霍奇斯 (Andrew Hodges) 1983 年撰写的精彩传记。图灵在破解德国恩尼格玛密码机中所扮演的角色,在 2014 年的电影《模仿游戏》中被戏剧化地展现出来。
2. See the excellent biography by Andrew Hodges (1983). Turing’s role in breaking the German Enigma cryptographic machine were dramatized in the 2014 film The Imitation Game.
4.图灵机示例改编自 John Hopcroft、Rajeev Motwani 和 Jeffrey Ullman (2001 年,第 8 章)。该图基于 Sebastian Sardina 的示例,网址为http://www.texample.net/tikz/examples/turing-machine-2/。
4. The Turing machine example is adapted from John Hopcroft, Rajeev Motwani, and Jeffrey Ullman (2001, chapter 8). The figure is based on Sebastian Sardina’s example at http://www.texample.net/tikz/examples/turing-machine-2/.
5.有关丘奇-图灵论题的更多信息,请参阅 Lewis 和 Papadimitriou 1998 年著作第 5 章。有关丘奇-图灵论题的历史及其各种变体的讨论,请参阅 Copeland 和 Shagrir 2019 年著作。
5. For more on the Church-Turing thesis, see Lewis and Papadimitriou 1998, chapter 5. For a discussion of the history of the Church-Turing thesis and various variants, see Copeland and Shagrir 2019.
Alpaydin, Ethem. 2016.机器学习. 马萨诸塞州剑桥:麻省理工学院出版社。
Alpaydin, Ethem. 2016. Machine Learning. Cambridge, MA: MIT Press.
Bachrach, Ran, Ran El-Yaniv 和 Martin Reinstädtler。2002 年。“在线列表访问算法的竞争理论与实践。” Algorithmica 32 (2): 201–245。
Bachrach, Ran, Ran El-Yaniv, and Martin Reinstädtler. 2002. “On the Competitive Theory and Practice of Online List Accessing Algorithms.” Algorithmica 32 (2): 201–245.
巴拉巴西、阿尔伯特·拉斯洛和波斯法伊·马顿。 2016.网络科学。剑桥:剑桥大学出版社。
Barabási, Albert-László, and Pósfai Márton. 2016. Network Science. Cambridge: Cambridge University Press.
Bar-Noy、Amotz、Rajeev Motwani 和 Joseph Naor。1992 年。“贪婪算法是在线边着色的最佳算法。” 《信息处理快报》 44 (5): 251–253。
Bar-Noy, Amotz, Rajeev Motwani, and Joseph Naor. 1992. “The Greedy Algorithm Is Optimal for Online Edge Coloring.” Information Processing Letters 44 (5): 251–253.
Bearden, J. Neil。2006 年。“基于等级选择和基数支付的新秘书问题。” 《数学心理学杂志》 50:58–59。
Bearden, J. Neil. 2006. “A New Secretary Problem with Rank-Based Selection and Cardinal Payoffs.” Journal of Mathematical Psychology 50:58–59.
Benjamin, Arthur、Gary Chartrand 和 Ping Zhang。2015。《迷人的图论世界》。新泽西州普林斯顿:普林斯顿大学出版社。
Benjamin, Arthur, Gary Chartrand, and Ping Zhang. 2015. The Fascinating World of Graph Theory. Princeton, NJ: Princeton University Press.
Bentley, Jon. 2000.编程珠玑. 第二版. 波士顿: Addison-Wesley.
Bentley, Jon. 2000. Programming Pearls. 2nd ed. Boston: Addison-Wesley.
Berry, Michael W. 和 Murray Browne。2005。《理解文本引擎:数学建模与文本检索》。第二版。费城:工业与应用数学学会。
Berry, Michael W., and Murray Browne. 2005. Understanding Text Engines: Mathematical Modeling and Text Retrieval. 2nd ed. Philadelphia: Society for Industrial and Applied Mathematics.
Biggs, Norman L., E. Keith Lloyd 和 Robin J. Wilson。1986 年。《图论,1736–1936》。牛津:克拉伦登出版社。
Biggs, Norman L., E. Keith Lloyd, and Robin J. Wilson. 1986. Graph Theory, 1736–1936. Oxford: Clarendon Press.
Bjorklund, Eric。1999年。“SNS计时系统中重复频率模式生成理论。”SNS-NOTE-CNTRL-99。散裂中子源。https ://ics-web.sns.ornl.gov/timing/Rep-Rate%20Tech%20Note.pdf。
Bjorklund, Eric. 1999. “The Theory of Rep-Rate Pattern Generation in the SNS Timing System.” SNS-NOTE-CNTRL-99. Spallation Neutron Source. https://ics-web.sns.ornl.gov/timing/Rep-Rate%20Tech%20Note.pdf.
Bloch, Joshua。2006 年。“额外补充——全面了解:几乎所有二分查找和归并排序都失效了。” Google AI 博客,6 月 2 日。http://googleresearch.blogspot.it/2006/06/extra-extra-read-all-about-it-nearly.html。
Bloch, Joshua. 2006. “Extra, Extra—Read All about It: Nearly All Binary Searches and Mergesorts Are Broken.” Google AI Blog, June 2. http://googleresearch.blogspot.it/2006/06/extra-extra-read-all-about-it-nearly.html.
Brin, Sergey 和 Lawrence Page。1998 年。“大型超文本 Web 搜索引擎的剖析。”计算机网络与 ISDN 系统30 (1-7): 107-117。
Brin, Sergey, and Lawrence Page. 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” Computer Networks and ISDN Systems 30 (1–7): 107–117.
Bryan, Kurt 和 Tanya Leise。2006 年。“价值 250 亿美元的特征向量:谷歌背后的线性代数。” 《SIAM 评论》 48 (3): 569–581。
Bryan, Kurt, and Tanya Leise. 2006. “The $25,000,000,000 Eigenvector: The Linear Algebra behind Google.” SIAM Review 48 (3): 569–581.
Charniak, Eugene。2018。深度学习简介。马萨诸塞州剑桥:麻省理工学院出版社。
Charniak, Eugene. 2018. Introduction to Deep Learning. Cambridge, MA: MIT Press.
Compeau、Phillip EC、Pavel A. Pevzner 和 Glenn Tesler。2011 年。“如何将德布鲁因图应用于基因组组装。” 《自然生物技术》 29 (11): 987–991。
Compeau, Phillip E. C., Pavel A. Pevzner, and Glenn Tesler. 2011. “How to Apply de Bruijn Graphs to Genome Assembly.” Nature Biotechnology 29 (11): 987–991.
Copeland, B. Jack 和 Oron Shagrir。2019 年。“丘奇-图灵论题:逻辑极限还是可突破的障碍?” ACM 通讯62 (1): 66–74。
Copeland, B. Jack, and Oron Shagrir. 2019. “The Church-Turing Thesis: Logical Limit or Breachable Barrier?” Communications of the ACM 62 (1): 66–74.
Demaine, Erik D., Francisco Gomez-Martin, Henk Meijer, David Rappaport, Perouz Taslakian, Godfried T. Toussaint, Terry Winograd 和 David R. Wood。2009年。“音乐的距离几何。” 《计算几何:理论与应用》 42 (5): 429–454。
Demaine, Erik D., Francisco Gomez-Martin, Henk Meijer, David Rappaport, Perouz Taslakian, Godfried T. Toussaint, Terry Winograd, and David R. Wood. 2009. “The Distance Geometry of Music.” Computational Geometry: Theory and Applications 42 (5): 429–454.
戴森,乔治。2012。图灵大教堂:数字宇宙的起源。纽约:Vintage Books。
Dyson, George. 2012. Turing’s Cathedral: The Origins of the Digital Universe. New York: Vintage Books.
Elsken, Thomas、Jan Hendrik Metzen 和 Frank Hutter。2018 年。“神经架构搜索:综述”。康奈尔大学 ArXiv。8 月 16 日。http ://arxiv.org/abs/1808.05377。
Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. 2018. “Neural Architecture Search: A Survey.” ArXiv, Cornell University. August 16. http://arxiv.org/abs/1808.05377.
欧拉霍,莱昂哈多。 1736.“相关几何问题的解决方案”。Commetarii Academiae Scientiarum Imperialis Petropolitanae 8:128-140。
Eulerho, Leonhardo. 1736. “Solutio Problematis Ad Geometrian Situs Pertinentis.” Commetarii Academiae Scientiarum Imperialis Petropolitanae 8:128–140.
Ferguson, Thomas S. 1989.“谁解决了秘书问题?”统计科学4 (3): 282–289。
Ferguson, Thomas S. 1989. “Who Solved the Secretary Problem?” Statistical Science 4 (3): 282–289.
Fleischner, Herbert 编,1991 年。“第十章:欧拉路径与环路分解算法,迷宫搜索算法。”载于《欧拉图及相关主题》,50:X.1–X.34。阿姆斯特丹:爱思唯尔。
Fleischner, Herbert, ed. 1991. “Chapter X Algorithms for Eulerian Trails and Cycle Decompositions, Maze Search Algorithms.” In Eulerian Graphs and Related Topics, 50:X.1–X.34. Amsterdam: Elsevier.
Franceschet, Massimo. 2011. “PageRank:站在巨人的肩膀上。” 《ACM通讯》 54 (6): 92–101。
Franceschet, Massimo. 2011. “PageRank: Standing on the Shoulders of Giants.” Communications of the ACM 54 (6): 92–101.
Friend, Edward H. 1956.“电子计算机系统上的排序。” Journal of the ACM 3 (3): 134–168。
Friend, Edward H. 1956. “Sorting on Electronic Computer Systems.” Journal of the ACM 3 (3): 134–168.
Goodfellow, Ian、Yoshua Bengio 和 Aaron Courville。2016 年。深度学习。马萨诸塞州剑桥:麻省理工学院出版社。
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Cambridge, MA: MIT Press.
Hand, David J. 2014. 《不可能性原理:巧合、奇迹和罕见事件为何每天都会发生》 . 纽约: Farrar, Straus and Giroux。
Hand, David J. 2014. The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day. New York: Farrar, Straus and Giroux.
霍金,斯蒂芬。1988。《时间简史》。纽约:班塔姆图书。
Hawking, Stephen. 1988. A Brief History of Time. New York: Bantam Books.
希尔霍尔泽、卡尔. 1873.“Ueber die Möglichkeit, einen Linienzug ohne Wiederholung und ohne Unterbrechung zu Umfahren”。数学年鉴6 (1):30-32。
Hierholzer, Carl. 1873. “Ueber die Möglichkeit, einen Linienzug ohne Wiederholung und ohne Unterbrechung zu Umfahren.” Mathematische Annalen 6 (1): 30–32.
Hoare, CAR 1961a. “算法 63:分割。” ACM 通讯4 (7): 321。
Hoare, C. A. R. 1961a. “Algorithm 63: Partition.” Communications of the ACM 4 (7): 321.
Hoare, CAR 1961b. “算法 64:快速排序。” ACM 通讯4 (7): 321。
Hoare, C. A. R. 1961b. “Algorithm 64: Quicksort.” Communications of the ACM 4 (7): 321.
Hoare, CAR 1961c. “算法 65:查找。” ACM 通讯4 (7): 321–322。
Hoare, C. A. R. 1961c. “Algorithm 65: Find.” Communications of the ACM 4 (7): 321–322.
霍奇斯,安德鲁。1983。艾伦·图灵:谜。纽约:西蒙与舒斯特出版社。
Hodges, Andrew. 1983. Alan Turing: The Enigma. New York: Simon and Schuster.
Hollerith, Herman。1894 年。“电子制表机。” 《皇家统计学会杂志》 57 (4): 678–689。
Hollerith, Herman. 1894. “The Electrical Tabulating Machine.” Journal of the Royal Statistical Society 57 (4): 678–689.
Hopcroft, John E.、Rajeev Motwani 和 Jeffrey D. Ullman。2001。自动机理论、语言和计算简介。第二版。波士顿:Addison-Wesley。
Hopcroft, John E., Rajeev Motwani, and Jeffrey D. Ullman. 2001. Introduction to Automata Theory, Languages, and Computation. 2nd ed. Boston: Addison-Wesley.
Iwashita, Hiroaki, Yoshio Nakazawa, Jun Kawahara, Takeaki Uno, Shin-ichi Minato。2013年。“利用最小完美哈希函数高效计算网格图中的路径数。”技术报告TCS-TR-A-13-64。北海道大学信息科学技术研究生院计算机科学系。
Iwashita, Hiroaki, Yoshio Nakazawa, Jun Kawahara, Takeaki Uno, and Shin-ichi Minato. 2013. “Efficient Computation of the Number of Paths in a Grid Graph with Minimal Perfect Hash Functions.” Technical Report TCS-TR-A-13-64. Division of Computer Science, Graduate School of Information Science, Technology, Hokkaido University.
凯库勒,八月。 1872.“Ueber Einige Condensationsprodukte Des Aldehyds”。化学与药学年鉴162 (1): 77–124。
Kekulé, August. 1872. “Ueber Einige Condensationsprodukte Des Aldehyds.” Annalen der Chemie und Pharmacie 162 (1): 77–124.
Kleinberg, Jon M. 1998. “超链接环境中的权威来源”。载于第九届 ACM-SIAM 离散算法研讨会论文集,第 668–677 页。费城:工业与应用数学学会。
Kleinberg, Jon M. 1998. “Authoritative Sources in a Hyperlinked Environment.” In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 668–677. Philadelphia: Society for Industrial and Applied Mathematics.
Kleinberg, Jon M. 1999.“超链接环境中的权威来源。” Journal of the ACM 46 (5): 604–632。
Kleinberg, Jon M. 1999. “Authoritative Sources in a Hyperlinked Environment.” Journal of the ACM 46 (5): 604–632.
Knuth, Donald E. 1970.“冯·诺依曼的第一个计算机程序。”计算概览2 (4): 247–261。
Knuth, Donald E. 1970. “Von Neumann’s First Computer Program.” Computing Surveys 2 (4): 247–261.
Knuth, Donald E. 1972.“古巴比伦算法。” 《ACM通讯》 15 (7): 671–677。
Knuth, Donald E. 1972. “Ancient Babylonian Algorithms.” Communications of the ACM 15 (7): 671–677.
Knuth, Donald E. 1997.计算机编程艺术,第 1 卷:基本算法。第 3 版。Reading, MA: Addison-Wesley。
Knuth, Donald E. 1997. The Art of Computer Programming, Volume 1: Fundamental Algorithms. 3rd ed. Reading, MA: Addison-Wesley.
Knuth, Donald E. 1998. 《计算机编程艺术》第3卷:排序和搜索。第二版。Reading, MA: Addison-Wesley。
Knuth, Donald E. 1998. The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd ed. Reading, MA: Addison-Wesley.
Knuth, Donald E. 2011.计算机编程艺术,第 4A 卷:组合算法,第 1 部分。Upper Saddle River,新泽西州:Addison-Wesley。
Knuth, Donald E. 2011. The Art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1. Upper Saddle River, NJ: Addison-Wesley.
Langville, Amy N. 和 Carl D. Meyer。2006。Google的 PageRank 及其超越:搜索引擎排名的科学。新泽西州普林斯顿:普林斯顿大学出版社。
Langville, Amy N., and Carl D. Meyer. 2006. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton, NJ: Princeton University Press.
LeCun、Yann、Yoshua Bengio 和 Geoffrey Hinton。2015 年。“深度学习。” 《自然》 521 (7553): 436–444。
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–444.
Lewis, Harry R. 和 Christos H. Papadimitriou。1998。《计算理论要素》。第二版。Upper Saddle River,新泽西州: Prentice Hall。
Lewis, Harry R., and Christos H. Papadimitriou. 1998. Elements of the Theory of Computation. 2nd ed. Upper Saddle River, NJ: Prentice Hall.
McCabe, John. 1965. “论可重定位记录的连续文件。” 《运筹学》 13 (4): 609–618。
McCabe, John. 1965. “On Serial Files with Relocatable Records.” Operations Research 13 (4): 609–618.
McCulloch, Warren S. 和 Walter Pitts。1943 年。“神经活动中内在观念的逻辑演算。” 《数学生物物理学报》 5 (4): 115–133。
McCulloch, Warren S., and Walter Pitts. 1943. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics 5 (4): 115–133.
Merton, Robert K. 1968.“科学中的马太效应。” 《科学》 159 (3810): 56–63。
Merton, Robert K. 1968. “The Matthew Effect in Science.” Science 159 (3810): 56–63.
Minsky, Marvin 和 Seymour Papert。1969。感知器:计算几何导论。马萨诸塞州剑桥:麻省理工学院出版社。
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press.
Misa, Thomas J. 和 Philip L. Frana。2010 年。“Edsger W. Dijkstra 访谈”。《ACM 通讯》 53 (8): 41–47。
Misa, Thomas J., and Philip L. Frana. 2010. “An Interview with Edsger W. Dijkstra.” Communications of the ACM 53 (8): 41–47.
Mitzenmacher, Michael 和 Eli Upfal。2017。《概率与计算:算法与数据分析中的随机化与概率技术》。第二版。剑桥:剑桥大学出版社。
Mitzenmacher, Michael, and Eli Upfal. 2017. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. 2nd ed. Cambridge: Cambridge University Press.
帕克,马特。2014。在四维空间中要做的事情:一个数学家的自恋数字之旅、最优约会算法、至少两种无穷大等等。伦敦:企鹅图书。
Parker, Matt. 2014. Things to Make and Do in the Fourth Dimension: A Mathematician’s Journey through Narcissistic Numbers, Optimal Dating Algorithms, at Least Two Kinds of Infinity, and More. London: Penguin Books.
Pattis, Richard E. 1988.“二分查找中的教科书错误。” SIGCSE Bulletin 20 (1): 190–194。
Pattis, Richard E. 1988. “Textbook Errors in Binary Searching.” SIGCSE Bulletin 20 (1): 190–194.
Pevzner, Pavel A., Haixu Tang 和 Michael S. Waterman。2001 年。“DNA 片段组装的欧拉路径方法。” 《美国国家科学院院刊》 98 (17): 9748–9753。
Pevzner, Pavel A., Haixu Tang, and Michael S. Waterman. 2001. “An Eulerian Path Approach to DNA Fragment Assembly.” Proceedings of the National Academy of Sciences 98 (17): 9748–9753.
平克,史蒂文。2018。《当下的启蒙:理性、科学、人文主义与进步的案例》。纽约:维京出版社。
Pinker, Steven. 2018. Enlightenment Now: The Case for Reason, Science, Humanism, and Progress. New York: Viking Press.
Rivest, Ronald. 1976. “论自组织顺序搜索启发式算法。” 《ACM通讯》 19(2): 63–67。
Rivest, Ronald. 1976. “On Self-Organizing Sequential Search Heuristics.” Communications of the ACM 19 (2): 63–67.
Rosenblatt, Frank。1957年。“感知器:一种感知和识别的自动机。”报告85-460-1。康奈尔航空实验室。
Rosenblatt, Frank. 1957. “The Perceptron: A Perceiving and Recognizing Automaton.” Report 85-460-1. Cornell Aeronautical Laboratory.
Rumelhart, David E., Geoffrey E. Hinton 和 Ronald J. Williams。1986 年。“通过反向传播误差学习表征。” 《自然》 323 (6088): 533–536。
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–536.
Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser 等人,2016 年,《利用深度神经网络和树形搜索掌握围棋》。《自然》 529 (7587): 484–489。
Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, et al. 2016. “Mastering the Game of Go with Deep Neural Networks and Tree Search.” Nature 529 (7587): 484–489.
Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert 等人,2017 年,《无需人类知识即可掌握围棋》。《自然》 550 (7676): 354–359。
Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, et al. 2017. “Mastering the Game of Go without Human Knowledge.” Nature 550 (7676): 354–359.
斯特罗加茨,史蒂文。2012。《x 的乐趣:从一到无穷的数学导览》。纽约:霍顿·米夫林·哈考特出版社。
Strogatz, Steven. 2012. The Joy of x: A Guided Tour of Math, from One to Infinity. New York: Houghton Mifflin Harcourt.
纳西姆·尼古拉斯·塔勒布。2007。《黑天鹅:极不可能事件的影响》。纽约:兰登书屋。
Taleb, Nassim Nicholas. 2007. The Black Swan: The Impact of the Highly Improbable. New York: Random House.
Toussaint, Godfried T. 2005. “欧几里得算法生成传统音乐节奏。” 摘自《文艺复兴班夫:数学、音乐、艺术、文化》,Reza Sarhangi 和 Robert V. Moody 主编,第47-56页。堪萨斯州温菲尔德:桥梁会议,西南学院。
Toussaint, Godfried T. 2005. “The Euclidean Algorithm Generates Traditional Musical Rhythms.” In Renaissance Banff: Mathematics, Music, Art, Culture, edited by Reza Sarhangi and Robert V. Moody, 47–56. Winfield, KS: Bridges Conference, Southwestern College.
Toussaint, Godfried T. 2013. 《音乐节奏的几何学:什么构成了“好”的节奏?》佛罗里达州博卡拉顿:CRC出版社。
Toussaint, Godfried T. 2013. The Geometry of Musical Rhythm: What Makes a “Good” Rhythm Good? Boca Raton, FL: CRC Press.
Turing, Alan M. 1937. “论可计算数及其在判定问题中的应用。”伦敦数学学会会刊S2-42:230-265。
Turing, Alan M. 1937. “On Computable Numbers, with an Application to the Entscheidungsproblem.” Proceedings of the London Mathematical Society S2–42:230–265.
Turing, Alan M. 1938. “论可计算数及其在判定问题中的应用。一个更正。”伦敦数学学会会刊S2-43:544-546。
Turing, Alan M. 1938. “On Computable Numbers, with an Application to the Entscheidungsproblem. A Correction.” Proceedings of the London Mathematical Society S2–43:544–546.
泰森,尼尔·德格拉斯,迈克尔·艾布拉姆·施特劳斯和理查德·J·戈特。2016。欢迎来到宇宙:天体物理之旅。新泽西州普林斯顿:普林斯顿大学出版社。
Tyson, Neil deGrasse, Michael Abram Strauss, and Richard J. Gott. 2016. Welcome to the Universe: An Astrophysical Tour. Princeton, NJ: Princeton University Press.
West, Geoffrey。2017。《尺度:生物、城市和公司的生命、生长和死亡的普遍规律》。伦敦:Weidenfeld and Nicholson。
West, Geoffrey. 2017. Scale: The Universal Laws of Life, Growth, and Death in Organisms, Cities, and Companies. London: Weidenfeld and Nicholson.
Xiao, Han, Kashif Rasul 和 Roland Vollgraf。2017 年。“Fashion-MNIST:用于基准测试机器学习算法的新型图像数据集。”8 月 28 日。https ://arxiv.org/abs/1708.07747。
Xiao, Han, Kashif Rasul, and Roland Vollgraf. 2017. “Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms.” August 28. https://arxiv.org/abs/1708.07747.
布鲁萨德,梅雷迪斯。2018。人工智能:计算机如何误解世界。马萨诸塞州剑桥:麻省理工学院出版社。
Broussard, Meredith. 2018. Artificial Unintelligence: How Computers Misunderstand the World. Cambridge, MA: MIT Press.
Christian, Brian 和 Tom Griffiths。2016。《赖以生存的算法:人类决策的计算机科学》。纽约:Henry Holt and Company。
Christian, Brian, and Tom Griffiths. 2016. Algorithms to Live By: The Computer Science of Human Decisions. New York: Henry Holt and Company.
Cormen, Thomas H. 2013.算法解锁. 马萨诸塞州剑桥:麻省理工学院出版社。
Cormen, Thomas H. 2013. Algorithms Unlocked. Cambridge, MA: MIT Press.
托马斯·H·科门、查尔斯·E·莱瑟森、罗纳德·L·里维斯特和克利福德·斯坦。 2009。算法导论。第三版。马萨诸塞州剑桥:麻省理工学院出版社。
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. 3rd ed. Cambridge, MA: MIT Press.
Denning, Peter J. 和 Matti Tedre。2019。计算思维。马萨诸塞州剑桥:麻省理工学院出版社。
Denning, Peter J., and Matti Tedre. 2019. Computational Thinking. Cambridge, MA: MIT Press.
Dewdney, AK 1993. (新)图灵全集:66个计算机科学之旅。纽约:WH Freeman and Company。
Dewdney, A. K. 1993. The (New) Turing Omnibus: 66 Excursions in Computer Science. New York: W. H. Freeman and Company.
戴森,乔治。2012。图灵大教堂:数字宇宙的起源。纽约:Vintage Books。
Dyson, George. 2012. Turing’s Cathedral: The Origins of the Digital Universe. New York: Vintage Books.
Erwig, Martin. 2017.从前有个算法:故事如何解释计算. 马萨诸塞州剑桥:麻省理工学院出版社。
Erwig, Martin. 2017. Once upon an Algorithm: How Stories Explain Computing. Cambridge, MA: MIT Press.
弗莱,汉娜。2018。《你好,世界:机器时代如何做人》。伦敦:Doubleday。
Fry, Hannah. 2018. Hello World: How to Be Human in the Age of the Machine. London: Doubleday.
Harel, David 和 Yishai Feldman。2004。算法:计算的精神。第 3 版。英国 Harlow:Addison-Wesley。
Harel, David, and Yishai Feldman. 2004. Algorithmics: The Spirit of Computing. 3rd ed. Harlow, UK: Addison-Wesley.
Louridas, Panos. 2017.真实世界算法:初学者指南. 马萨诸塞州剑桥:麻省理工学院出版社。
Louridas, Panos. 2017. Real-World Algorithms: A Beginner’s Guide. Cambridge, MA: MIT Press.
MacCormick,John。2013。《改变未来的九种算法:驱动当今计算机的巧妙创意》。新泽西州普林斯顿:普林斯顿大学出版社。
MacCormick, John. 2013. Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today’s Computers. Princeton, NJ: Princeton University Press.
奥尼尔,凯茜。2016。《数学毁灭性武器:大数据如何加剧不平等并威胁民主》。纽约:皇冠出版集团。
O’Neil, Cathy. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Crown Publishing Group.
佩佐尔德,查尔斯。2008。《图灵注释:艾伦·图灵关于可计算性和图灵机的历史性论文导览》。印第安纳波利斯:威利出版社。
Petzold, Charles. 2008. The Annotated Turing: A Guided Tour through Alan Turing’s Historic Paper on Computability and the Turing Machine. Indianapolis: Wiley Publishing.
Sedgewick, Robert 和 Kevin Wayne。2017。计算机科学:一种跨学科方法。波士顿:Addison-Wesley。
Sedgewick, Robert, and Kevin Wayne. 2017. Computer Science: An Interdisciplinary Approach. Boston: Addison-Wesley.
激活函数,186
Activation function, 186
非循环图,52
Acyclic graph, 52
腺嘌呤,53
Adenine, 53
邻接矩阵,158
Adjacency matrix, 158
伊索,65岁
Aesop, 65
代数,4
Algebra, 4
算法,词源,4-5
Algorithm, etymology of, 4–5
算法年龄,1
Algorithmic age, 1
花剌子米,穆罕默德·本·穆萨。参见穆罕默德·本·穆萨·花剌子米(Khwārizmī )
al-Khwārizmī, Muḥammad ibn Mūsā. See Khwārizmī, Muḥammad ibn Mūsā al-
Alphabet(公司),143
Alphabet (company), 143
Altavista(搜索引擎),143
Altavista (search engine), 143
近似搜索,81
Approximate search, 81
近似算法,41
Approximation algorithm, 41
人工神经网络,202
Artificial neural network, 202
原子键,108
Atomic key, 108
自动微分,228
Automatic differentiation, 228
反向链接,151
Backlink, 151
反向传播算法,211
Backpropagation algorithm, 211
英格兰银行,231
Bank of England, 231
底数(对数),37
Base (logarithm), 37
苯,52
Benzene, 52
伯纳斯-李,蒂姆,147
Berners-Lee, Tim, 147
偏见(神经元),186
Bias (neuron), 186
大O符号,32
Big O notation, 32
二分查找,95–104
Binary search, 95–104
Bing(搜索引擎),143
Bing (search engine), 143
Bossa-Nova(节奏),10
Bossa-Nova (rhythm), 10
布林,塞维,143
Brin, Servey, 143
布什,万尼瓦尔,147
Bush, Vannevar, 147
卡尼,马克,231
Carney, Mark, 231
分类交叉熵,222
Categorical cross-entropy, 222
美国人口普查,105-107
Census, US, 105–107
中央处理器 (CPU),225
Central processing unit (CPU), 225
欧洲核子研究中心 (CERN),147
CERN (European Organization for Nuclear Research), 147
国际象棋的发明,35–36
Chess, invention of, 35–36
色度指数,63
Chromatic index, 63
教堂,阿隆佐,241
Church, Alonzo, 241
丘奇-图灵论题,241
Church-Turing thesis, 241
电路,图,48
Circuit, graph, 48
分类器,192
Classifier, 192
认知工具,3
Cognitive tools, 3
哥伦比亚钟形图案(节奏),9
Columbia bell pattern (rhythm), 9
复合键,108
Composite key, 108
计算复杂度,31
Computational complexity, 31
恒定的复杂性,35
Constant complexity, 35
控制结构,19
Control structure, 19
周期,图表,51–52
Cycle, graph, 51–52
胞嘧啶,53
Cytosine, 53
悬垂节点,167–171
Dangling node, 167–171
决策边界,190
Decision boundary, 190
紧密连接层,203
Densely connected layers, 203
导数,199
Derivative, 199
迪杰斯特拉·埃德格68岁
Dijkstra, Edger, 68
Dijkstra算法,68–78
Dijkstra’s algorithm, 68–78
有向无环图(dag),52
Directed acyclic graph (dag), 52
有向图(有向图),51
Directed graph (digraph), 51
分而治之法,38
Divide-and-conquer method, 38
DNA组装,52-57
DNA assembly, 52–57
边缘,图表,47
Edge, graph, 47
边缘着色,61
Edge coloring, 61
边权重,66
Edge weight, 66
EDVAC(电子离散变量自动计算机),142
EDVAC (Electronic Discrete Variable Automatic Computer), 142
特征值,249
Eigenvalue, 249
特征向量,177
Eigenvector, 177
ENIAC(电子数字积分计算机),265n4(第 3 章)
ENIAC (Electronic Numerical Integrator and Computer), 265n4 (chap. 3)
《时代》,200
Epoch, 200
欧几里得,17岁
Euclid, 17
欧几里得算法,17
Euclid’s algorithm, 17
欧拉,莱昂哈德,44–47
Euler, Leonhard, 44–47
欧拉路径或步行,48
Eulerian path or walk, 48
欧拉环路或回路,48
Eulerian tour or circuit, 48
欧拉常数,38
Euler’s number, 38
埃克萨,34岁
Exa, 34
精确搜索,81
Exact search, 81
Excite(搜索引擎),143
Excite (search engine), 143
指数复杂度,40
Exponential complexity, 40
指数增长,35–36
Exponential growth, 35–36
阶乘,40
Factorial, 40
阶乘复杂度,40
Factorial complexity, 40
“垃圾进,垃圾出”,78
“Garbage in, garbage out,” 78
吉加,34岁
Giga, 34
全局最优,63
Global optimum, 63
谷歌矩阵,171–177
Google matrix, 171–177
古戈尔,34岁
Googol, 34
Googolplex,34岁
Googolplex, 34
渐变,199
Gradient, 199
图表,47
Graph, 47
图形处理单元 (GPU),225
Graphics processing unit (GPU), 225
最大公约数(gcd),16
Greatest common divisor (gcd), 16
贪婪算法,62
Greedy algorithm, 62
鸟嘌呤,53
Guanine, 53
霍金,斯蒂芬,xvii
Hawking, Stephen, xvii
启发式,62
Heuristic, 62
隐藏层,204
Hidden layer, 204
希尔霍尔泽,卡尔,55岁
Hierholzer, Carl, 55
Hierholzer算法,55
Hierholzer algorithm, 55
爬坡方法,63
Hill climbing approach, 63
霍尔,托尼,132
Hoare, Tony, 132
Hollerith,Herman,105–106
Hollerith, Herman, 105–106
Hotbot(搜索引擎),143
Hotbot (search engine), 143
超链接,145
Hyperlink, 145
超链接矩阵,158–167
Hyperlink matrix, 158–167
超平面,201
Hyperplane, 201
超文本,147
Hypertext, 147
图像识别,212
Image recognition, 212
Infoseek(搜索引擎),143
Infoseek (search engine), 143
插入排序,114–116
Insertion sort, 114–116
国际商业机器公司(IBM),106
International Business Machines (IBM), 106
棘手问题,41
Intractable problems, 41
迭代(控制结构),19
Iteration (control structure), 19
加里宁格勒,43
Kaliningrad, 43
凯库勒,八月,52
Kekulé, August, 52
开普勒,约翰尼斯,91–92
Kepler, Johannes, 91–92
钥匙,108
Key, 108
花剌子米,穆罕默德·本·穆萨·阿尔,4
Khwārizmī, Muḥammad ibn Mūsā al-, 4
Knuth,Donald,i –244
Knuth, Donald, i–244
柯尼斯堡,43岁
Königsberg, 43
柯尼斯堡桥梁问题,44
Königsberg bridge problem, 44
标签(分类),213
Label (classification), 213
紧密连接的层,203
Layers, densely connected, 203
线性可分数据,201
Linearly separable data, 201
线性搜索,84
Linear search, 84
线性时间算法和复杂性,38
Linear time algorithms, and complexity, 38
林克,47岁
Link, 47
链表,82
Linked list, 82
列表,82
List, 82
列表头,83
List head, 83
局部最优,63
Local optimum, 63
对数,37
Logarithm, 37
对数线性时间算法和复杂性,39
Loglinear time algorithms, and complexity, 39
循环(控制结构),19
Loop (control structure), 19
损失(机器学习),195
Loss (machine learning), 195
Lycos(搜索引擎),143
Lycos (search engine), 143
莱登,克里斯托弗,1
Lydon, Christopher, 1
机器学习,194
Machine learning, 194
矩阵,158
Matrix, 158
归并排序,133–142
Merge sort, 133–142
默顿,罗伯特·金,88岁
Merton, Robert King, 88
微软,143
Microsoft, 143
摩尔·戈登36岁
Moore, Gordon, 36
摩尔定律,36
Moore’s law, 36
移至前端算法,89
Move-to-front algorithm, 89
Mpre节奏,10
Mpre rhythm, 10
多重图,49
Multigraph, 49
多组,49
Multiset, 49
NASA (National Aeronautics and Space Administration), 144, 148
自然对数,38
Natural logarithm, 38
神经元,182
Neuron, 182
中子源,20
Neutron source, 20
《纽约时报》,1
New York Times, 1
节点,图,47
Node, graph, 47
节点,列表,83
Node, list, 83
空,83
Null, 83
起始(节奏),9
Onset (rhythm), 9
最优停止问题,92
Optimal stopping problem, 92
过度拟合,201
Overfitting, 201
溢出,102
Overflow, 102
佩奇,拉里,143
Page, Larry, 143
PageRank ,xx,143–144,176
Pagerank向量,159
Pagerank vector, 159
偏导数,199
Partial derivative, 199
路径,图表,48
Path, graph, 48
路径长度,66
Path length, 66
感知器,190
Perceptron, 190
排列,108
Permutation, 108
佩塔,34岁
Peta, 34
枢轴,124
Pivot, 124
指针,83
Pointer, 83
多项式复杂度,39
Polynomial complexity, 39
幂法,166
Power method, 166
编程,21
Programming, 21
编程语言,21
Programming language, 21
穿孔卡片,106
Punched cards, 106
量子计算机,241
Quantum computer, 241
量子比特,241
Qubit, 241
快速排序,123–133
Quicksort, 123–133
开放源电台,1
Radio Open Source, 1
基数排序,116–123
Radix sort, 116–123
拉蒙·卡哈尔,圣地亚哥,183
Ramón y Cajal, Santiago, 183
随机算法,133
Randomized algorithm, 133
随机冲浪者,167–171
Random surfer, 167–171
变化率,198
Rate of change, 198
记录,108
Record, 108
整流器,188
Rectifier, 188
放松,68
Relaxation, 68
ReLU(修正线性单元),190
ReLU (rectified linear unit), 190
音乐中的节奏,9-14
Rhythm, in music, 9–14
搜索空间,99
Search space, 99
秘书问题,93
Secretary problem, 93
选择(控制结构),19
Selection (control structure), 19
选择排序,110–114
Selection sort, 110–114
自组织搜索,90
Self-organizing search, 90
序列(控制结构),19
Sequence (control structure), 19
顺序搜索,84
Sequential search, 84
最短路径,66
Shortest path, 66
S形,188
Sigmoid, 188
社交网络,49
Social network, 49
Softmax,217
Softmax, 217
排序方法,
Sorting methods,
插入,114–116
insertion, 114–116
合并,133–142
merge, 133–142
快速排序,123–133
quicksort, 123–133
基数,116–123
radix, 116–123
选择,110–114
selection, 110–114
字符串,121
string, 121
散裂中子源(SNS),20–21
Spallation Neutron Source (SNS), 20–21
斯帕林,21岁
Spalling, 21
稀疏矩阵,178
Sparse matrix, 178
字符串,121
String, 121
字符串排序方法,121
String sorting method, 121
叠加,241
Superposition, 241
监督学习,192
Supervised learning, 192
突触,184
Synapse, 184
制表机,106
Tabulating machine, 106
Tanh(双曲正切),188
Tanh (hyperbolic tangent), 188
泰拉,34岁
Tera, 34
测试数据集,194
Test data set, 194
胸腺嘧啶,53
Thymine, 53
游览,图表,48
Tour, graph, 48
锦标赛安排,57–65
Tournament scheduling, 57–65
训练(机器学习),192
Training (machine learning), 192
训练数据集,192
Training data set, 192
转置法,90
Transposition method, 90
旅行商问题,41
Traveling salesman problem, 41
图灵,艾伦,231
Turing, Alan, 231
图灵机,232–244
Turing machine, 232–244
一元数字系统,236
Unary numeral system, 236
联合国,144
United Nations, 144
无监督学习,194
Unsupervised learning, 194
美国人口普查,105-107
US Census, 105–107
矢量,159
Vector, 159
Vertex,47
Vertex, 47
顶点着色,61
Vertex coloring, 61
约翰·冯·诺依曼(Neumann János Lajos),142
von Neumann, John (Neumann János Lajos), 142
重量,边缘,66
Weight, edge, 66
加权输入,186
Weighted input, 186
权重(神经元),186
Weights (neuron), 186
维基百科,144
Wikipedia, 144
威尔逊,EO,34岁
Wilson, E. O., 34
约塔,34岁
Yotta, 34
泽塔,34岁
Zetta, 34
麻省理工学院出版社基本知识系列
The MIT Press Essential Knowledge Series
人工智能伦理,马克·科克尔伯格
AI Ethics, Mark Coeckelbergh
算法,帕诺斯·洛里达斯
Algorithms, Panos Louridas
反腐败,罗伯特·I·罗特伯格
Anticorruption, Robert I. Rotberg
拍卖,蒂莫西·P·哈伯德和哈里·J·帕尔施
Auctions, Timothy P. Hubbard and Harry J. Paarsch
《Amaranth Borsuk 的书》
The Book, Amaranth Borsuk
碳捕获,霍华德·J·赫尔佐格
Carbon Capture, Howard J. Herzog
公民身份,迪米特里·科切诺夫
Citizenship, Dimitry Kochenov
云计算,Nayan B. Ruparelia
Cloud Computing, Nayan B. Ruparelia
合作协会,Dariusz Jemielniak 和 Aleksandra Przegalinska
Collaborative Society, Dariusz Jemielniak and Aleksandra Przegalinska
计算思维,Peter J. Denning 和 Matti Tedre
Computational Thinking, Peter J. Denning and Matti Tedre
计算:简史,Paul E. Ceruzzi
Computing: A Concise History, Paul E. Ceruzzi
《意识心灵》,佐尔坦·E·托雷
The Conscious Mind, Zoltan E. Torey
避孕,唐娜·德鲁克
Contraception, Donna Drucker
批判性思维,乔纳森·哈伯
Critical Thinking, Jonathan Haber
众包,达伦·C·布拉汉姆
Crowdsourcing, Daren C. Brabham
玩世不恭,安斯加·艾伦
Cynicism, Ansgar Allen
数据科学,John D. Kelleher 和 Brendan Tierney
Data Science, John D. Kelleher and Brendan Tierney
深度学习,约翰·D·凯莱赫
Deep Learning, John D. Kelleher
外星人,韦德·鲁什
Extraterrestrials, Wade Roush
极端主义,JM伯杰
Extremism, J. M. Berger
假照片, Hany Farid
Fake Photos, Hany Farid
功能磁共振成像,Peter A. Bandettini
fMRI, Peter A. Bandettini
食品,法比奥·帕拉塞科利
Food, Fabio Parasecoli
自由意志,马克·巴拉格尔
Free Will, Mark Balaguer
未来,尼克·蒙特福特
The Future, Nick Montfort
GPS,保罗·E·塞鲁齐
GPS, Paul E. Ceruzzi
触觉,Lynette A. Jones
Haptics, Lynette A. Jones
信息与社会,迈克尔·巴克兰
Information and Society, Michael Buckland
《信息与现代公司》,詹姆斯·W·科塔达
Information and the Modern Corporation, James W. Cortada
知识产权战略,约翰·帕尔弗里
Intellectual Property Strategy, John Palfrey
物联网,塞缪尔·格林加德
The Internet of Things, Samuel Greengard
《反讽与讽刺》,罗杰·克鲁兹
Irony and Sarcasm, Roger Kreuz
机器学习:新人工智能,Ethem Alpaydin
Machine Learning: The New AI, Ethem Alpaydin
机器翻译,Thierry Poibeau
Machine Translation, Thierry Poibeau
宏观经济学,Felipe Larraín B.
Macroeconomics, Felipe Larraín B.
数字文化中的模因,Limor Shifman
Memes in Digital Culture, Limor Shifman
元数据,Jeffrey Pomerantz
Metadata, Jeffrey Pomerantz
《身心问题》,乔纳森·韦斯特法尔
The Mind–Body Problem, Jonathan Westphal
MOOC,乔纳森·哈伯
MOOCs, Jonathan Haber
神经可塑性,Moheb Costandi
Neuroplasticity, Moheb Costandi
虚无主义,诺伦·格茨
Nihilism, Nolen Gertz
开放存取,彼得·苏伯
Open Access, Peter Suber
悖论,玛格丽特·库佐
Paradox, Margaret Cuonzo
后真相,李·麦金泰尔
Post-Truth, Lee McIntyre
量子纠缠,杰德·布罗迪
Quantum Entanglement, Jed Brody
推荐引擎,迈克尔·施拉格
Recommendation Engines, Michael Schrage
回收,芬·阿恩·约根森
Recycling, Finn Arne Jørgensen
机器人,约翰·乔丹
Robots, John Jordan
《学校选择》,大卫·R·加西亚
School Choice, David R. Garcia
自我追踪,吉娜·内夫和唐·纳弗斯
Self-Tracking, Gina Neff and Dawn Nafus
性同意,米莱娜·波波娃
Sexual Consent, Milena Popova
智慧城市,Germaine R. Halegoua
Smart Cities, Germaine R. Halegoua
太空飞行,迈克尔·J·纽菲尔德
Spaceflight, Michael J. Neufeld
空间计算,Shashi Shekhar 和 Pamela Vold
Spatial Computing, Shashi Shekhar and Pamela Vold
可持续发展,肯特·波特尼
Sustainability, Kent E. Portney
联觉,理查德·E·西托维克
Synesthesia, Richard E. Cytowic
《技术奇点》,默里·沙纳汉
The Technological Singularity, Murray Shanahan
3D打印,约翰·乔丹
3D Printing, John Jordan
理解信仰,尼尔斯·J·尼尔森
Understanding Beliefs, Nils J. Nilsson
虚拟现实,塞缪尔·格林加德
Virtual Reality, Samuel Greengard
波浪,弗雷德里克·赖希伦
Waves, Frederic Raichlen
Panos Louridas 是雅典经济与商业大学管理科学与技术系副教授。他致力于算法应用、软件工程、安全、实用密码学和应用机器学习的研究。他是麻省理工学院出版社出版的《真实世界算法:初学者指南》一书的作者。他是一位活跃的程序员,至今已有超过二十五年的职业生涯。
Panos Louridas is Associate Professor in the Department of Management Science and Technology at the Athens University of Economics and Business. He works on algorithmic applications, software engineering, security, practical cryptography, and applied machine learning. He is the author of Real-World Algorithms: A Beginners Guide, published by the MIT Press. He has been an active programmer for more than a quarter of a century.